首页 > 最新文献

Journal of Cheminformatics最新文献

英文 中文
The algebraic extended atom-type graph-based model for precise ligand–receptor binding affinity prediction 基于代数扩展原子型图的配体-受体结合亲和力精确预测模型
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-22 DOI: 10.1186/s13321-025-00955-z
Farjana Tasnim Mukta, Md Masud Rana, Avery Meyer, Sally Ellingson, Duc D. Nguyen

Accurate prediction of ligand-receptor binding affinity is crucial in structure-based drug design, significantly impacting the development of effective drugs. Recent advances in machine learning (ML)–based scoring functions have improved these predictions, yet challenges remain in modeling complex molecular interactions. This study introduces the AGL-EAT-Score, a scoring function that integrates extended atom-type multiscale weighted colored subgraphs with algebraic graph theory. This approach leverages the eigenvalues and eigenvectors of graph Laplacian and adjacency matrices to capture high-level details of specific atom pairwise interactions. Evaluated against benchmark datasets such as CASF-2016, CASF-2013, and the Cathepsin S dataset, the AGL-EAT-Score demonstrates notable accuracy, outperforming existing traditional and ML-based methods. The model’s strength lies in its comprehensive similarity analysis, examining protein sequence, ligand structure, and binding site similarities, thus ensuring minimal bias and over-representation in the training sets. The use of extended atom types in graph coloring enhances the model’s capability to capture the intricacies of protein-ligand interactions. The AGL-EAT-Score marks a significant advancement in drug design, offering a tool that could potentially refine and accelerate the drug discovery process.

Scientific Contribution

The AGL-EAT-Score presents an algebraic graph-based framework that predicts ligand-receptor binding affinity by constructing multiscale weighted colored subgraphs from the 3D structure of protein-ligand complexes. It improves prediction accuracy by modeling interactions between extended atom types, addressing challenges like dataset bias and over-representation. Benchmark evaluations demonstrate that AGL-EAT-Score outperforms existing methods, offering a robust and systematic tool for structure-based drug design.

准确预测配体-受体结合亲和力在基于结构的药物设计中至关重要,对有效药物的开发具有重要影响。基于机器学习(ML)的评分功能的最新进展改进了这些预测,但在复杂分子相互作用的建模方面仍然存在挑战。本文介绍了一种将扩展原子型多尺度加权彩色子图与代数图理论相结合的评分函数AGL-EAT-Score。该方法利用图拉普拉斯矩阵和邻接矩阵的特征值和特征向量来捕获特定原子成对相互作用的高级细节。通过对CASF-2016、CASF-2013和Cathepsin S等基准数据集的评估,AGL-EAT-Score显示出显著的准确性,优于现有的传统方法和基于ml的方法。该模型的优势在于其全面的相似性分析,检测蛋白质序列、配体结构和结合位点的相似性,从而确保训练集中最小的偏差和过度表征。在图形着色中使用扩展原子类型增强了模型捕捉蛋白质-配体相互作用的复杂性的能力。AGL-EAT-Score标志着药物设计的重大进步,提供了一种可能改进和加速药物发现过程的工具。AGL-EAT-Score提供了一个基于代数图的框架,通过从蛋白质-配体复合物的3D结构构建多尺度加权彩色子图来预测配体-受体结合亲和力。它通过建模扩展原子类型之间的相互作用来提高预测精度,解决了数据集偏差和过度表示等挑战。基准评估表明,AGL-EAT-Score优于现有方法,为基于结构的药物设计提供了一个强大而系统的工具。
{"title":"The algebraic extended atom-type graph-based model for precise ligand–receptor binding affinity prediction","authors":"Farjana Tasnim Mukta,&nbsp;Md Masud Rana,&nbsp;Avery Meyer,&nbsp;Sally Ellingson,&nbsp;Duc D. Nguyen","doi":"10.1186/s13321-025-00955-z","DOIUrl":"10.1186/s13321-025-00955-z","url":null,"abstract":"<div><p>Accurate prediction of ligand-receptor binding affinity is crucial in structure-based drug design, significantly impacting the development of effective drugs. Recent advances in machine learning (ML)–based scoring functions have improved these predictions, yet challenges remain in modeling complex molecular interactions. This study introduces the AGL-EAT-Score, a scoring function that integrates extended atom-type multiscale weighted colored subgraphs with algebraic graph theory. This approach leverages the eigenvalues and eigenvectors of graph Laplacian and adjacency matrices to capture high-level details of specific atom pairwise interactions. Evaluated against benchmark datasets such as CASF-2016, CASF-2013, and the Cathepsin S dataset, the AGL-EAT-Score demonstrates notable accuracy, outperforming existing traditional and ML-based methods. The model’s strength lies in its comprehensive similarity analysis, examining protein sequence, ligand structure, and binding site similarities, thus ensuring minimal bias and over-representation in the training sets. The use of extended atom types in graph coloring enhances the model’s capability to capture the intricacies of protein-ligand interactions. The AGL-EAT-Score marks a significant advancement in drug design, offering a tool that could potentially refine and accelerate the drug discovery process.</p><p><b>Scientific Contribution</b></p><p> The AGL-EAT-Score presents an algebraic graph-based framework that predicts ligand-receptor binding affinity by constructing multiscale weighted colored subgraphs from the 3D structure of protein-ligand complexes. It improves prediction accuracy by modeling interactions between extended atom types, addressing challenges like dataset bias and over-representation. Benchmark evaluations demonstrate that AGL-EAT-Score outperforms existing methods, offering a robust and systematic tool for structure-based drug design.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00955-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StreamChol: a web-based application for predicting cholestasis StreamChol:一个基于网络的预测胆汁淤积的应用程序
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-21 DOI: 10.1186/s13321-024-00943-9
Pablo Rodríguez-Belenguer, Emilio Soria-Olivas, Manuel Pastor

This article introduces StreamChol, a software for developing and applying mechanistic models to predict cholestasis. StreamChol is a Streamlit application, usable as a desktop application or web-accessible software when installed on a server using a docker container.

StreamChol allows a seamless integration of pharmacokinetic analyses with Machine Learning models. This integration not only enables cholestasis prediction but also opens avenues for predicting other toxicological endpoints requiring similar integrations. StreamChol's Docker containerization also streamlines deployment across diverse environments, addressing potential compatibility issues. StreamChol is distributed as open-source under GNU GPL v3, reflecting our commitment to open science. Through StreamChol, researchers are offered a potent tool for predictive modelling in toxicology, harnessing its strengths within an intuitive and user-friendly interface, without the need for any programming knowledge.

Scientific contribution This work offers a user-friendly web-based tool for cholestasis prediction and a complete workflow for creating web platforms that require the combination of both programming languages, R and Python.

本文介绍了一款用于开发和应用机制模型预测胆汁淤积的软件StreamChol。StreamChol是一个streamlight应用程序,当使用docker容器安装在服务器上时,可以作为桌面应用程序或web访问软件使用。StreamChol允许药物动力学分析与机器学习模型的无缝集成。这种整合不仅能够预测胆汁淤积,而且还为预测需要类似整合的其他毒理学终点开辟了途径。StreamChol的Docker容器化还简化了跨不同环境的部署,解决了潜在的兼容性问题。StreamChol在GNU GPL v3下作为开源发布,反映了我们对开放科学的承诺。通过StreamChol,研究人员为毒理学预测建模提供了一个强大的工具,在直观和用户友好的界面中利用其优势,无需任何编程知识。这项工作为胆汁淤积预测提供了一个用户友好的基于web的工具,并为创建需要R和Python两种编程语言组合的web平台提供了一个完整的工作流程。
{"title":"StreamChol: a web-based application for predicting cholestasis","authors":"Pablo Rodríguez-Belenguer,&nbsp;Emilio Soria-Olivas,&nbsp;Manuel Pastor","doi":"10.1186/s13321-024-00943-9","DOIUrl":"10.1186/s13321-024-00943-9","url":null,"abstract":"<div><p>This article introduces StreamChol, a software for developing and applying mechanistic models to predict cholestasis. StreamChol is a Streamlit application, usable as a desktop application or web-accessible software when installed on a server using a docker container.</p><p>StreamChol allows a seamless integration of pharmacokinetic analyses with Machine Learning models. This integration not only enables cholestasis prediction but also opens avenues for predicting other toxicological endpoints requiring similar integrations. StreamChol's Docker containerization also streamlines deployment across diverse environments, addressing potential compatibility issues. StreamChol is distributed as open-source under GNU GPL v3, reflecting our commitment to open science. Through StreamChol, researchers are offered a potent tool for predictive modelling in toxicology, harnessing its strengths within an intuitive and user-friendly interface, without the need for any programming knowledge.</p><p><b>Scientific contribution </b> This work offers a user-friendly web-based tool for cholestasis prediction and a complete workflow for creating web platforms that require the combination of both programming languages, R and Python.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00943-9","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Matched pairs demonstrate robustness against inter-assay variability 配对对对测定间变异性具有稳健性
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-20 DOI: 10.1186/s13321-025-00956-y
Jochem Nelen, Horacio Pérez-Sánchez, Hans De Winter, Dries Van Rompaey

Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for Ki and IC50 values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.

化学的机器学习模型需要大型数据集,通常通过组合来自多个分析的数据来编译。然而,在没有仔细管理的情况下合并数据可能会带来明显的噪音。虽然不同测定的绝对值很少具有可比性,但通常假定化合物之间的趋势或差异是一致的。本研究通过分析不同测定中匹配化合物对之间的效价差异和测定元数据管理对减少误差的影响来评估这一假设。我们发现配对对之间的效价差异表现出比单个化合物测量更小的可变性,这表明系统分析差异可能部分抵消配对数据。元数据管理进一步提高了分析间的一致性,尽管以数据集大小为代价。对于最少筛选的化合物对,在0.3个pChEMBL单位内,Ki和IC50值的一致性分别为44-46%,筛选后提高到66-79%。同样,在广泛筛选后,差异超过1个pChEMBL单位的配对百分比从12 - 15%下降到6-8%。这些结果为ChEMBL数据库中匹配分子对数据的预期噪声建立了基准,为数据质量评估提供了实用指标。
{"title":"Matched pairs demonstrate robustness against inter-assay variability","authors":"Jochem Nelen,&nbsp;Horacio Pérez-Sánchez,&nbsp;Hans De Winter,&nbsp;Dries Van Rompaey","doi":"10.1186/s13321-025-00956-y","DOIUrl":"10.1186/s13321-025-00956-y","url":null,"abstract":"<div><p>Machine learning models for chemistry require large datasets, often compiled by combining data from multiple assays. However, combining data without careful curation can introduce significant noise. While absolute values from different assays are rarely comparable, trends or differences between compounds are often assumed to be consistent. This study evaluates that assumption by analyzing potency differences between matched compound pairs across assays and assessing the impact of assay metadata curation on error reduction. We find that potency differences between matched pairs exhibit less variability than individual compound measurements, suggesting systematic assay differences may partially cancel out in paired data. Metadata curation further improves inter-assay agreement, albeit at the cost of dataset size. For minimally curated compound pairs, agreement within 0.3 pChEMBL units was found to be 44–46% for K<sub>i</sub> and IC<sub>50</sub> values respectively, which improved to 66–79% after curation. Similarly, the percentage of pairs with differences exceeding 1 pChEMBL unit dropped from 12 to 15% to 6–8% with extensive curation. These results establish a benchmark for expected noise in matched molecular pair data from the ChEMBL database, offering practical metrics for data quality assessment.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00956-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142990138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chemical space as a unifying theme for chemistry 化学空间作为化学的统一主题
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-16 DOI: 10.1186/s13321-025-00954-0
Jean-Louis Reymond

Chemistry has diversified from a basic understanding of the elements to studying millions of highly diverse molecules and materials, which together are conceptualized as the chemical space. A map of this chemical space where distances represent similarities between compounds can represent the mutual relationships between different subfields of chemistry and help the discipline to be viewed and understood globally.

化学已经从对元素的基本理解发展到对数百万高度多样化的分子和材料的研究,这些分子和材料一起被概念化为化学空间。这个化学空间的地图,其中距离表示化合物之间的相似性,可以表示化学不同子领域之间的相互关系,并有助于在全球范围内观察和理解该学科。
{"title":"Chemical space as a unifying theme for chemistry","authors":"Jean-Louis Reymond","doi":"10.1186/s13321-025-00954-0","DOIUrl":"10.1186/s13321-025-00954-0","url":null,"abstract":"<div><p>Chemistry has diversified from a basic understanding of the elements to studying millions of highly diverse molecules and materials, which together are conceptualized as the chemical space. A map of this chemical space where distances represent similarities between compounds can represent the mutual relationships between different subfields of chemistry and help the discipline to be viewed and understood globally.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00954-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142987640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening 一个尺寸不适合所有:修订用于虚拟筛选的QSAR模型评估准确性的传统范式
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-16 DOI: 10.1186/s13321-025-00948-y
James Wellnitz, Sankalp Jain, Joshua E. Hochuli, Travis Maxfield, Eugene N. Muratov, Alexander Tropsha, Alexey V. Zakharov

Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we now recommend the use of models with the highest positive predictive value (PPV) built on imbalanced training sets as preferred virtual screening tools. This recommendation stems from practical considerations of how the results of virtual screening are used in experimental laboratories where only a small fraction of virtually screened molecules can be tested using standard well plates. As a proof of concept, we have developed QSAR models for five expansive datasets with different ratios of active and inactive molecules and compared model performance in virtual screening using BA, PPV, and other metrics. We show that training on imbalanced datasets achieves a hit rate at least 30% higher than using balanced datasets, and that the PPV metric captured this difference of performance with no parameter tuning. Importantly, hit rates were estimated for top scoring compounds organized in batches of the size of plates (for instance, 128 molecules) used in the experimental high throughput screening. Based on the results of our studies, we posit that QSAR models trained on imbalanced datasets with the highest PPV should be relied upon to identify and test hit compounds in early drug discovery studies.

定量结构活动关系(QSAR)建模的传统最佳实践建议将数据集平衡和平衡精度(BA)作为模型开发的关键期望目标。本研究探讨了传统规范在使用QSAR模型进行现代大型和超大型化学文库虚拟筛选的背景下的价值。对于这个日益普遍的任务,我们现在推荐使用基于不平衡训练集的具有最高正预测值(PPV)的模型作为首选的虚拟筛选工具。这一建议源于对实验实验室如何使用虚拟筛选结果的实际考虑,在实验实验室中,只有一小部分虚拟筛选的分子可以使用标准孔板进行测试。为了验证这一概念,我们为5个具有不同活性和非活性分子比例的扩展数据集开发了QSAR模型,并使用BA、PPV和其他指标比较了模型在虚拟筛选中的性能。我们表明,在不平衡数据集上进行训练的命中率至少比使用平衡数据集高30%,并且PPV指标在没有参数调优的情况下捕获了这种性能差异。重要的是,在实验高通量筛选中,以板大小的批次(例如,128个分子)组织的得分最高的化合物的命中率被估计。基于我们的研究结果,我们假设在具有最高PPV的不平衡数据集上训练的QSAR模型应该依赖于识别和测试早期药物发现研究中的击中化合物。
{"title":"One size does not fit all: revising traditional paradigms for assessing accuracy of QSAR models used for virtual screening","authors":"James Wellnitz,&nbsp;Sankalp Jain,&nbsp;Joshua E. Hochuli,&nbsp;Travis Maxfield,&nbsp;Eugene N. Muratov,&nbsp;Alexander Tropsha,&nbsp;Alexey V. Zakharov","doi":"10.1186/s13321-025-00948-y","DOIUrl":"10.1186/s13321-025-00948-y","url":null,"abstract":"<div><p>Traditional best practices for quantitative structure activity relationship (QSAR) modeling recommend dataset balancing and balanced accuracy (BA) as the key desired objective of model development. This study explores the value of the conventional norms in the context of using QSAR models for virtual screening of modern large and ultra-large chemical libraries. For this increasingly common task, we now recommend the use of models with the highest positive predictive value (PPV) built on imbalanced training sets as preferred virtual screening tools. This recommendation stems from practical considerations of how the results of virtual screening are used in experimental laboratories where only a small fraction of virtually screened molecules can be tested using standard well plates. As a proof of concept, we have developed QSAR models for five expansive datasets with different ratios of active and inactive molecules and compared model performance in virtual screening using BA, PPV, and other metrics. We show that training on imbalanced datasets achieves a hit rate at least 30% higher than using balanced datasets, and that the PPV metric captured this difference of performance with no parameter tuning. Importantly, hit rates were estimated for top scoring compounds organized in batches of the size of plates (for instance, 128 molecules) used in the experimental high throughput screening. Based on the results of our studies, we posit that QSAR models trained on imbalanced datasets with the highest PPV should be relied upon to identify and test hit compounds in early drug discovery studies.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00948-y","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142987639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing 基于自然语言处理概念的构效关系迁移模拟序列上下文相关相似性分析
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-15 DOI: 10.1186/s13321-025-00951-3
Atsushi Yoshimori, Jürgen Bajorath

Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing. The methodology comprehensively accounts for substituent similarity, identifies non-classical bioisosteres, captures substituent-property relationships, and generates accurate AS alignments. Context-dependent similarity assessment is conceptually novel in computational medicinal chemistry and should also be of interest for other applications.

Scientific contribution

A method is reported to systematically search for and align analogue series with SAR transfer potential. Central to the approach is the assessment of context-dependent similarity for substituents, a new concept in cheminformatics, which is based upon vector embeddings and word pair relationships adapted from natural language processing.

类似物序列(AS)是药物化学中化合物优化过程中产生的,是构效关系(SAR)信息的主要来源。由具有相应取代基和相当效价进展的化合物组成的活性AS对代表了针对同一靶标或跨不同靶标的SAR转移事件。我们报告了一种新的计算方法来系统地搜索SAR传输序列,该方法将AS对齐算法与基于自然语言处理的向量嵌入的上下文相关相似性评估相结合。该方法全面考虑取代基相似性,识别非经典生物同位体,捕获取代基性质关系,并生成准确的AS比对。上下文相关的相似性评估在计算药物化学中是概念新颖的,也应该对其他应用感兴趣。本文报道了一种系统地搜索和对准模拟序列与SAR传递势的方法。该方法的核心是评估取代基的上下文相关相似性,这是化学信息学中的一个新概念,它基于向量嵌入和适应自然语言处理的词对关系。
{"title":"Context-dependent similarity analysis of analogue series for structure–activity relationship transfer based on a concept from natural language processing","authors":"Atsushi Yoshimori,&nbsp;Jürgen Bajorath","doi":"10.1186/s13321-025-00951-3","DOIUrl":"10.1186/s13321-025-00951-3","url":null,"abstract":"<div><p>Analogue series (AS) are generated during compound optimization in medicinal chemistry and are the major source of structure–activity relationship (SAR) information. Pairs of active AS consisting of compounds with corresponding substituents and comparable potency progression represent SAR transfer events for the same target or across different targets. We report a new computational approach to systematically search for SAR transfer series that combines an AS alignment algorithm with context-depending similarity assessment based on vector embeddings adapted from natural language processing. The methodology comprehensively accounts for substituent similarity, identifies non-classical bioisosteres, captures substituent-property relationships, and generates accurate AS alignments. Context-dependent similarity assessment is conceptually novel in computational medicinal chemistry and should also be of interest for other applications.</p><p><b>Scientific contribution</b></p><p>A method is reported to systematically search for and align analogue series with SAR transfer potential. Central to the approach is the assessment of context-dependent similarity for substituents, a new concept in cheminformatics, which is based upon vector embeddings and word pair relationships adapted from natural language processing.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00951-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142981587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology Fragmenstein:使用严格的基于保守结合的方法预测从已知晶体碎片命中衍生的化合物的蛋白质配体结构
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-13 DOI: 10.1186/s13321-025-00946-0
Matteo P. Ferla, Rubén Sánchez-García, Rachael E. Skyner, Stefan Gahbauer, Jenny C. Taylor, Frank von Delft, Brian D. Marsden, Charlotte M. Deane

Current strategies centred on either merging or linking initial hits from fragment-based drug design (FBDD) crystallographic screens generally do not fully leaverage 3D structural information. We show that an algorithmic approach (Fragmenstein) that ‘stitches’ the ligand atoms from this structural information together can provide more accurate and reliable predictions for protein–ligand complex conformation than general methods such as pharmacophore-constrained docking. This approach works under the assumption of conserved binding: when a larger molecule is designed containing the initial fragment hit, the common substructure between the two will adopt the same binding mode. Fragmenstein either takes the atomic coordinates of ligands from a experimental fragment screen and combines the atoms together to produce a novel merged virtual compound, or uses them to predict the bound complex for a provided molecule. The molecule is then energy minimised under strong constraints to obtain a structurally plausible conformer. The code is available at https://github.com/oxpig/Fragmenstein.

Scientific contribution

This work shows the importance of using the coordinates of known binders when predicting the conformation of derivative molecules through a retrospective analysis of the COVID Moonshot data. This method has had a prior real-world application in hit-to-lead screening, yielding a sub-micromolar merger from parent hits in a single round. It is therefore likely to further benefit future drug design campaigns and be integrated in future pipelines.

Graphical Abstract

目前以合并或连接基于片段的药物设计(FBDD)晶体学筛选的初始命中为中心的策略,通常不能完全充分利用三维结构信息。我们的研究表明,与药理约束对接等一般方法相比,将结构信息中的配体原子 "缝合 "在一起的算法方法(Fragmenstein)能更准确、更可靠地预测蛋白质-配体复合物的构象。这种方法是在保守结合的假设下工作的:当设计一个包含初始片段的更大分子时,两者之间的共同子结构将采用相同的结合模式。Fragmenstein 要么从实验片段筛选中获取配体的原子坐标,并将这些原子组合在一起生成一个新的合并虚拟化合物,要么使用这些原子坐标预测所提供分子的结合复合物。然后在强约束条件下对分子进行能量最小化,以获得结构上合理的构象。代码可在 https://github.com/oxpig/Fragmenstein 上获取。科学贡献 这项工作通过对 COVID Moonshot 数据的回顾性分析,说明了在预测衍生分子构象时使用已知结合体坐标的重要性。这种方法之前已在现实世界中应用于 "命中到先导 "筛选,在一轮筛选中就从母体命中分子中获得了亚微摩级的合并。因此,它有可能进一步有益于未来的药物设计活动,并被整合到未来的流水线中。
{"title":"Fragmenstein: predicting protein–ligand structures of compounds derived from known crystallographic fragment hits using a strict conserved-binding–based methodology","authors":"Matteo P. Ferla,&nbsp;Rubén Sánchez-García,&nbsp;Rachael E. Skyner,&nbsp;Stefan Gahbauer,&nbsp;Jenny C. Taylor,&nbsp;Frank von Delft,&nbsp;Brian D. Marsden,&nbsp;Charlotte M. Deane","doi":"10.1186/s13321-025-00946-0","DOIUrl":"10.1186/s13321-025-00946-0","url":null,"abstract":"<div><p>Current strategies centred on either merging or linking initial hits from fragment-based drug design (FBDD) crystallographic screens generally do not fully leaverage 3D structural information. We show that an algorithmic approach (Fragmenstein) that ‘stitches’ the ligand atoms from this structural information together can provide more accurate and reliable predictions for protein–ligand complex conformation than general methods such as pharmacophore-constrained docking. This approach works under the assumption of conserved binding: when a larger molecule is designed containing the initial fragment hit, the common substructure between the two will adopt the same binding mode. Fragmenstein either takes the atomic coordinates of ligands from a experimental fragment screen and combines the atoms together to produce a novel merged virtual compound, or uses them to predict the bound complex for a provided molecule. The molecule is then energy minimised under strong constraints to obtain a structurally plausible conformer. The code is available at https://github.com/oxpig/Fragmenstein.</p><p><b>Scientific contribution</b></p><p>This work shows the importance of using the coordinates of known binders when predicting the conformation of derivative molecules through a retrospective analysis of the COVID Moonshot data. This method has had a prior real-world application in hit-to-lead screening, yielding a sub-micromolar merger from parent hits in a single round. It is therefore likely to further benefit future drug design campaigns and be integrated in future pipelines.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00946-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142968286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction ADMET在药物发现中的评价:21。机器学习算法在Caco-2渗透率预测中的应用及工业验证
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-10 DOI: 10.1186/s13321-025-00947-z
Dong Wang, Jieyu Jin, Guqin Shi, Jingxiao Bao, Zheng Wang, Shimeng Li, Peichen Pan, Dan Li, Yu Kang, Tingjun Hou

The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates in vitro, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial for enhancing the efficiency of oral drug development. In this study, we conducted an in-depth analysis of the characteristics of an augmented Caco-2 permeability dataset, and evaluated a diverse range of machine learning algorithms in combination with different molecular representations. The results indicated that XGBoost generally provided better predictions than comparable models for the test sets. In addition, we investigated the transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets. Our findings, based on the Shanghai Qilu’s in-house dataset, showed that the boosting models retained a degree of predictive efficacy when applied to industry data. Furthermore, Y-randomization test and applicability domain analysis were employed to assess the robustness and generalizability of these models. Matched Molecular Pair Analysis (MMPA) was utilized to extract chemical transformation rules. We believe that the model developed in this study could represent a reliable tool for assessing Caco-2 permeability during early-stage drug discovery and the chemical transformation rules derived here could provide insights for optimizing Caco-2 permeability.

Scientific contribution

A comprehensive validation of various machine learning algorithms combined with diverse molecular representations on a large dataset for predicting Caco-2 permeability was reported. The transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets was also investigated. Matched molecular pair analysis was carried out to provide reasonable suggestions for researchers to improve the Caco-2 permeability of compounds.

Graphical Abstract

Caco-2细胞模型由于其形态和功能与人肠细胞相似,已被广泛用于体外评估候选药物的肠通透性。虽然Caco-2细胞测定被认为是安全且具有成本效益的,但它也具有耗时的特点。因此,在预测Caco-2渗透性方面达到高精度的计算模型对于提高口服药物开发效率至关重要。在本研究中,我们对增强型Caco-2渗透率数据集的特征进行了深入分析,并结合不同的分子表征评估了各种机器学习算法。结果表明,对于测试集,XGBoost通常比可比模型提供更好的预测。此外,我们调查了在公开可用数据上训练的机器学习模型到内部制药行业数据集的可移植性。我们基于上海齐鲁内部数据集的研究结果表明,当应用于行业数据时,提升模型保留了一定程度的预测功效。此外,采用y随机化检验和适用性域分析来评估这些模型的稳健性和泛化性。利用匹配分子对分析(MMPA)提取化学转化规律。我们认为,本研究中建立的模型可以作为早期药物发现过程中评估Caco-2渗透性的可靠工具,并且本文导出的化学转化规则可以为优化Caco-2渗透性提供见解。科学贡献据报道,在一个大型数据集上,综合验证了各种机器学习算法与不同分子表示相结合,用于预测Caco-2渗透率。还研究了在公开可用数据上训练的机器学习模型到内部制药行业数据集的可移植性。进行匹配分子对分析,为研究人员提高化合物Caco-2通透性提供合理建议。图形抽象
{"title":"ADMET evaluation in drug discovery: 21. Application and industrial validation of machine learning algorithms for Caco-2 permeability prediction","authors":"Dong Wang,&nbsp;Jieyu Jin,&nbsp;Guqin Shi,&nbsp;Jingxiao Bao,&nbsp;Zheng Wang,&nbsp;Shimeng Li,&nbsp;Peichen Pan,&nbsp;Dan Li,&nbsp;Yu Kang,&nbsp;Tingjun Hou","doi":"10.1186/s13321-025-00947-z","DOIUrl":"10.1186/s13321-025-00947-z","url":null,"abstract":"<div><p>The Caco-2 cell model has been widely used to assess the intestinal permeability of drug candidates <i>in vitro</i>, owing to its morphological and functional similarity to human enterocytes. While Caco-2 cell assay is considered safe and cost-effective, it is also characterized by being time-consuming. Therefore, computational models that achieve high accuracies in predicting Caco-2 permeability are crucial for enhancing the efficiency of oral drug development. In this study, we conducted an in-depth analysis of the characteristics of an augmented Caco-2 permeability dataset, and evaluated a diverse range of machine learning algorithms in combination with different molecular representations. The results indicated that XGBoost generally provided better predictions than comparable models for the test sets. In addition, we investigated the transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets. Our findings, based on the Shanghai Qilu’s <i>in-house</i> dataset, showed that the boosting models retained a degree of predictive efficacy when applied to industry data. Furthermore, Y-randomization test and applicability domain analysis were employed to assess the robustness and generalizability of these models. Matched Molecular Pair Analysis (MMPA) was utilized to extract chemical transformation rules. We believe that the model developed in this study could represent a reliable tool for assessing Caco-2 permeability during early-stage drug discovery and the chemical transformation rules derived here could provide insights for optimizing Caco-2 permeability.</p><p><b>Scientific contribution</b></p><p>A comprehensive validation of various machine learning algorithms combined with diverse molecular representations on a large dataset for predicting Caco-2 permeability was reported. The transferability of machine learning models trained on publicly available data to internal pharmaceutical industry datasets was also investigated. Matched molecular pair analysis was carried out to provide reasonable suggestions for researchers to improve the Caco-2 permeability of compounds.</p><h3>Graphical Abstract</h3>\u0000<div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00947-z","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142941146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions 克莱儿:化学反应EC数的对比学习预测器
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-07 DOI: 10.1186/s13321-024-00944-8
Zishuo Zeng, Jin Guo, Jiao Jin, Xiaozhou Luo

Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (Contrastive Learning-based AnnotatIon for Reaction’s EC), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations. CLAIRE achieved notable performance improvements, demonstrating weighted average F1 scores of 0.861 and 0.911 on the testing set (n = 18,816) and an independent dataset (n = 1040) derived from yeast’s metabolic model, respectively. Remarkably, CLAIRE significantly outperformed the state-of-the-art model by 3.65 folds and 1.18 folds, respectively. Its high accuracy positions CLAIRE as a promising tool for retrosynthesis planning, drug fate prediction, and synthetic biology applications. CLAIRE is freely available on GitHub (https://github.com/zishuozeng/CLAIRE).

Scientific contribution

This work employed contrastive learning for predicting enzymatic reaction’s EC numbers, overcoming the challenges in data scarcity and imbalance. The new model achieves the state-of-the-art performance and may facilitate the computer-aided synthesis planning.

预测化学反应的EC数可以为计算机辅助合成计划提供有效的酶注释。然而,由于数据稀缺和类别不平衡,传统的机器学习方法面临挑战。在这里,我们介绍了CLAIRE(基于对比学习的反应EC注释),这是一个利用对比学习、基于预训练语言模型的反应嵌入和数据增强来解决这些限制的新框架。CLAIRE取得了显著的性能改进,在测试集(n = 18,816)和独立数据集(n = 1040)上的加权平均F1得分分别为0.861和0.911。值得注意的是,CLAIRE的表现比最先进的模型分别高出3.65倍和1.18倍。它的高精度使CLAIRE成为逆转录计划、药物命运预测和合成生物学应用的有前途的工具。CLAIRE可以在GitHub上免费获得(https://github.com/zishuozeng/CLAIRE)。这项工作采用对比学习预测酶促反应的EC数,克服了数据稀缺和不平衡的挑战。新模型达到了最先进的性能,便于计算机辅助综合规划。
{"title":"CLAIRE: a contrastive learning-based predictor for EC number of chemical reactions","authors":"Zishuo Zeng,&nbsp;Jin Guo,&nbsp;Jiao Jin,&nbsp;Xiaozhou Luo","doi":"10.1186/s13321-024-00944-8","DOIUrl":"10.1186/s13321-024-00944-8","url":null,"abstract":"<div><p>Predicting EC numbers for chemical reactions enables efficient enzymatic annotations for computer-aided synthesis planning. However, conventional machine learning approaches encounter challenges due to data scarcity and class imbalance. Here, we introduce CLAIRE (<u>C</u>ontrastive <u>L</u>earning-based <u>A</u>nnotat<u>I</u>on for <u>R</u>eaction’s <u>E</u>C), a novel framework leveraging contrastive learning, pre-trained language model-based reaction embeddings, and data augmentation to address these limitations. CLAIRE achieved notable performance improvements, demonstrating weighted average F1 scores of 0.861 and 0.911 on the testing set (n = 18,816) and an independent dataset (n = 1040) derived from yeast’s metabolic model, respectively. Remarkably, CLAIRE significantly outperformed the state-of-the-art model by 3.65 folds and 1.18 folds, respectively. Its high accuracy positions CLAIRE as a promising tool for retrosynthesis planning, drug fate prediction, and synthetic biology applications. CLAIRE is freely available on GitHub (https://github.com/zishuozeng/CLAIRE).</p><p><b>Scientific contribution</b></p><p>This work employed contrastive learning for predicting enzymatic reaction’s EC numbers, overcoming the challenges in data scarcity and imbalance. The new model achieves the state-of-the-art performance and may facilitate the computer-aided synthesis planning.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00944-8","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142935529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Pt, Ir, Ru, and Rh complexes light absorption in the therapeutic window for phototherapy using machine learning 利用机器学习预测Pt, Ir, Ru和Rh配合物在光疗治疗窗口中的光吸收
IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY Pub Date : 2025-01-05 DOI: 10.1186/s13321-024-00939-5
V. Vigna, T. F. G. G. Cova, A. A. C. C. Pais, E. Sicilia

Effective light-based cancer treatments, such as photodynamic therapy (PDT) and photoactivated chemotherapy (PACT), rely on compounds that are activated by light efficiently, and absorb within the therapeutic window (600–850 nm). Traditional prediction methods for these light absorption properties, including Time-Dependent Density Functional Theory (TDDFT), are often computationally intensive and time-consuming. In this study, we explore a machine learning (ML) approach to predict the light absorption in the region of the therapeutic window of platinum, iridium, ruthenium, and rhodium complexes, aiming at streamlining the screening of potential photoactivatable prodrugs. By compiling a dataset of 9775 complexes from the Reaxys database, we trained six classification models, including random forests, support vector machines, and neural networks, utilizing various molecular descriptors. Our findings indicate that the Extreme Gradient Boosting Classifier (XGBC) paired with AtomPairs2D descriptors delivers the highest predictive accuracy and robustness. This ML-based method significantly accelerates the identification of suitable compounds, providing a valuable tool for the early-stage design and development of phototherapy drugs. The method also allows to change relevant structural characteristics of a base molecule using information from the supervised approach.

Scientific Contribution: The proposed machine learning (ML) approach predicts the ability of transition metal-based complexes to absorb light in the UV–vis therapeutic window, a key trait for phototherapeutic agents. While ML models have been used to predict UV–vis properties of organic molecules, applying this to metal complexes is novel. The model is efficient, fast, and resource-light, using decision tree-based algorithms that provide interpretable results. This interpretability helps to understand classification rules and facilitates targeted structural modifications to convert inactive complexes into potentially active ones.

有效的基于光的癌症治疗,如光动力疗法(PDT)和光激活化疗(PACT),依赖于被光有效激活的化合物,并在治疗窗口(600-850 nm)内吸收。传统的光吸收特性预测方法,包括时变密度泛函理论(TDDFT),通常计算量大,耗时长。在这项研究中,我们探索了一种机器学习(ML)方法来预测铂、铱、钌和铑配合物治疗窗口区域的光吸收,旨在简化潜在光激活前药的筛选。通过编译来自Reaxys数据库的9775个复合物数据集,我们利用各种分子描述符训练了6种分类模型,包括随机森林、支持向量机和神经网络。我们的研究结果表明,与AtomPairs2D描述符配对的极端梯度增强分类器(XGBC)提供了最高的预测精度和鲁棒性。这种基于ml的方法显著加快了合适化合物的鉴定,为光疗药物的早期设计和开发提供了有价值的工具。该方法还允许使用来自监督方法的信息改变基础分子的相关结构特征。科学贡献:提出的机器学习(ML)方法预测过渡金属基配合物在UV-vis治疗窗口中吸收光的能力,这是光治疗剂的关键特性。虽然ML模型已用于预测有机分子的UV-vis性质,但将其应用于金属配合物是新颖的。该模型高效、快速、资源少,使用基于决策树的算法,提供可解释的结果。这种可解释性有助于理解分类规则,并促进有针对性的结构修饰,将非活性配合物转化为潜在的活性配合物。
{"title":"Prediction of Pt, Ir, Ru, and Rh complexes light absorption in the therapeutic window for phototherapy using machine learning","authors":"V. Vigna,&nbsp;T. F. G. G. Cova,&nbsp;A. A. C. C. Pais,&nbsp;E. Sicilia","doi":"10.1186/s13321-024-00939-5","DOIUrl":"10.1186/s13321-024-00939-5","url":null,"abstract":"<div><p>Effective light-based cancer treatments, such as photodynamic therapy (PDT) and photoactivated chemotherapy (PACT), rely on compounds that are activated by light efficiently, and absorb within the therapeutic window (600–850 nm). Traditional prediction methods for these light absorption properties, including Time-Dependent Density Functional Theory (TDDFT), are often computationally intensive and time-consuming. In this study, we explore a machine learning (ML) approach to predict the light absorption in the region of the therapeutic window of platinum, iridium, ruthenium, and rhodium complexes, aiming at streamlining the screening of potential photoactivatable prodrugs. By compiling a dataset of 9775 complexes from the Reaxys database, we trained six classification models, including random forests, support vector machines, and neural networks, utilizing various molecular descriptors. Our findings indicate that the Extreme Gradient Boosting Classifier (XGBC) paired with AtomPairs2D descriptors delivers the highest predictive accuracy and robustness. This ML-based method significantly accelerates the identification of suitable compounds, providing a valuable tool for the early-stage design and development of phototherapy drugs. The method also allows to change relevant structural characteristics of a base molecule using information from the supervised approach.</p><p><b>Scientific Contribution:</b> The proposed machine learning (ML) approach predicts the ability of transition metal-based complexes to absorb light in the UV–vis therapeutic window, a key trait for phototherapeutic agents. While ML models have been used to predict UV–vis properties of organic molecules, applying this to metal complexes is novel. The model is efficient, fast, and resource-light, using decision tree-based algorithms that provide interpretable results. This interpretability helps to understand classification rules and facilitates targeted structural modifications to convert inactive complexes into potentially active ones.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":7.1,"publicationDate":"2025-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00939-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Cheminformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1