Journal of Chemical Information and Modeling 最新文献_第7页

Agentic Knowledge Graphs of the LiFePO4 Cathode for Lithium Ion Battery: Balancing Discovery and Stability with LLMs. 锂离子电池正极LiFePO4的代理知识图谱：LLMs的平衡发现和稳定性。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-22 DOI: 10.1021/acs.jcim.5c02572

Lee Bin Choi,Ohyeon Lee,Sanghun Lee

Lithium iron phosphate (LiFePO4, LFP) has regained prominence as a cathode for lithium ion batteries thanks to its intrinsic safety, thermal stability, long cycle life, and cost advantages. We present an agentic knowledge-graph pipeline that converts titles/abstracts into directed, signed agent → property relations. Using a Scopus corpus of the 9500 most-cited LFP journal articles (2000-present), we benchmark three matched modes: A, rules with a closed vocabulary; B, LLM-only with an open vocabulary; and mixed LLM with a hybrid vocabulary. A yields a compact, high-precision core; B expands recall but increases label dispersion; C preserves much of B's breadth while maintaining schema alignment via canonicalization and role gating. Robustness tests with eight bootstrap passes show rapid convergence: requiring recurrence across ∼6 passes plus a modest publication-support threshold yields a compact, high-confidence backbone. The resulting network is predominantly positive and centers on transport and interfacial outcomes, with a small number of mixed and negative ties indicating condition dependence. Beyond LFP, the workflow can be adapted to other battery chemistries with modest retuning of vocabularies and projection rules alongside routine validation on held-out annotations, enabling a stability-aware, literature-scale synthesis of direction-of-effect relations.

磷酸铁锂（LiFePO4， LFP）由于其固有的安全性、热稳定性、长循环寿命和成本优势，重新成为锂离子电池的正极材料。我们提出了一个代理知识图管道，它将标题/摘要转换为直接的、签名的代理→财产关系。使用Scopus语料库中9500篇被引用最多的LFP期刊文章（2000年至今），我们对三种匹配模式进行了基准测试：a，具有封闭词汇的规则；B， LLM-only与一个开放的词汇；并将LLM与混合词汇相结合。A产生紧凑，高精度的核心；B扩大召回，但增加标签分散；C保留了B的大部分宽度，同时通过规范化和角色控制保持模式一致性。八次自举通过的稳健性测试显示出快速收敛：要求在~ 6次通过中进行递归，加上适度的发表支持阈值，可以产生紧凑、高置信度的主干。由此产生的网络主要是积极的，以运输和界面结果为中心，少数混合和消极的联系表明条件依赖。除了LFP之外，该工作流程还可以适应其他电池化学过程，只需适度地调整词汇表和投影规则，并对保留注释进行常规验证，从而实现稳定性感知、文献规模的效果方向关系综合。

{"title":"Agentic Knowledge Graphs of the LiFePO4 Cathode for Lithium Ion Battery: Balancing Discovery and Stability with LLMs.","authors":"Lee Bin Choi,Ohyeon Lee,Sanghun Lee","doi":"10.1021/acs.jcim.5c02572","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02572","url":null,"abstract":"Lithium iron phosphate (LiFePO4, LFP) has regained prominence as a cathode for lithium ion batteries thanks to its intrinsic safety, thermal stability, long cycle life, and cost advantages. We present an agentic knowledge-graph pipeline that converts titles/abstracts into directed, signed agent → property relations. Using a Scopus corpus of the 9500 most-cited LFP journal articles (2000-present), we benchmark three matched modes: A, rules with a closed vocabulary; B, LLM-only with an open vocabulary; and mixed LLM with a hybrid vocabulary. A yields a compact, high-precision core; B expands recall but increases label dispersion; C preserves much of B's breadth while maintaining schema alignment via canonicalization and role gating. Robustness tests with eight bootstrap passes show rapid convergence: requiring recurrence across ∼6 passes plus a modest publication-support threshold yields a compact, high-confidence backbone. The resulting network is predominantly positive and centers on transport and interfacial outcomes, with a small number of mixed and negative ties indicating condition dependence. Beyond LFP, the workflow can be adapted to other battery chemistries with modest retuning of vocabularies and projection rules alongside routine validation on held-out annotations, enabling a stability-aware, literature-scale synthesis of direction-of-effect relations.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"62 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146015149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CoDrug: A Text-Driven Molecular Virtual Screening and Multiproperty Optimization Framework via Multimodal Language Model. CoDrug：基于多模态语言模型的文本驱动分子虚拟筛选和多属性优化框架。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-22 DOI: 10.1021/acs.jcim.5c02499

Rui Gu,Yingxu Liu,Bingxing Zhu,Li Liang,Haichun Liu,Yanmin Zhang,Yadong Chen

Traditional molecular screening methods are often limited by high computational cost, long design cycles, and a strong reliance on high-quality 3D protein structures, which are not always available or reliable. To address these limitations, we propose CoDrug, an innovative multimodal fusion framework that integrates textual information with structural representations of proteins and compounds. CoDrug employs two complementary fusion strategies─text-protein sequence fusion, in which SciBERT encodes functional descriptions and ESM extracts sequence-level features, and text-compound structure fusion, in which ChemFormer encodes SMILES and SciBERT processes compound-related textual descriptions. Using contrastive learning, CoDrug aligns textual and structural embeddings in a shared latent space, enabling effective cross-modal representation learning. This architecture supports novel functionalities, including text-driven virtual screening and text-driven molecular optimization, enhancing representation expressiveness and generalization while delivering strong performance under zero-shot settings. Evaluations on diverse benchmarks demonstrate that CoDrug achieves competitive or superior results compared with state-of-the-art baselines, particularly when 3D structural data are incomplete or unavailable. The framework's natural language interface lowers the technical barrier for AI-assisted drug discovery, allowing chemists to efficiently navigate and optimize chemical space without specialized computational expertise. By bridging language-driven hypotheses and structure-guided molecular design, CoDrug offers a scalable and flexible paradigm for accelerating the early stages of drug discovery.

传统的分子筛选方法往往受到计算成本高、设计周期长以及强烈依赖高质量的3D蛋白质结构的限制，这些结构并不总是可用或可靠的。为了解决这些限制，我们提出了CoDrug，这是一个创新的多模态融合框架，将文本信息与蛋白质和化合物的结构表示相结合。CoDrug采用两种互补的融合策略──文本-蛋白质序列融合，其中SciBERT编码功能描述，ESM提取序列级特征；文本-化合物结构融合，其中ChemFormer编码SMILES， SciBERT处理化合物相关的文本描述。使用对比学习，CoDrug在共享潜在空间中对齐文本和结构嵌入，从而实现有效的跨模态表示学习。该架构支持新颖的功能，包括文本驱动的虚拟筛选和文本驱动的分子优化，增强表示表现力和泛化，同时在零射击设置下提供强大的性能。对各种基准的评估表明，与最先进的基准相比，CoDrug取得了具有竞争力或更好的结果，特别是在3D结构数据不完整或不可用的情况下。该框架的自然语言界面降低了人工智能辅助药物发现的技术障碍，允许化学家在没有专门计算专业知识的情况下有效地导航和优化化学空间。通过连接语言驱动的假设和结构引导的分子设计，CoDrug为加速药物发现的早期阶段提供了一个可扩展和灵活的范例。

{"title":"CoDrug: A Text-Driven Molecular Virtual Screening and Multiproperty Optimization Framework via Multimodal Language Model.","authors":"Rui Gu,Yingxu Liu,Bingxing Zhu,Li Liang,Haichun Liu,Yanmin Zhang,Yadong Chen","doi":"10.1021/acs.jcim.5c02499","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02499","url":null,"abstract":"Traditional molecular screening methods are often limited by high computational cost, long design cycles, and a strong reliance on high-quality 3D protein structures, which are not always available or reliable. To address these limitations, we propose CoDrug, an innovative multimodal fusion framework that integrates textual information with structural representations of proteins and compounds. CoDrug employs two complementary fusion strategies─text-protein sequence fusion, in which SciBERT encodes functional descriptions and ESM extracts sequence-level features, and text-compound structure fusion, in which ChemFormer encodes SMILES and SciBERT processes compound-related textual descriptions. Using contrastive learning, CoDrug aligns textual and structural embeddings in a shared latent space, enabling effective cross-modal representation learning. This architecture supports novel functionalities, including text-driven virtual screening and text-driven molecular optimization, enhancing representation expressiveness and generalization while delivering strong performance under zero-shot settings. Evaluations on diverse benchmarks demonstrate that CoDrug achieves competitive or superior results compared with state-of-the-art baselines, particularly when 3D structural data are incomplete or unavailable. The framework's natural language interface lowers the technical barrier for AI-assisted drug discovery, allowing chemists to efficiently navigate and optimize chemical space without specialized computational expertise. By bridging language-driven hypotheses and structure-guided molecular design, CoDrug offers a scalable and flexible paradigm for accelerating the early stages of drug discovery.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"263 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Graph-Based Deep Learning Models for Predicting pKa Values of Protein-Ionizable Residues via Physically Inspired Feature Engineering. 基于图的深度学习模型，通过物理启发特征工程预测蛋白质电离残基的pKa值。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-22 DOI: 10.1021/acs.jcim.5c01681

Ziyu Song,Ruixuan Wang,Xun Jiao,Zuyi Huang

The pKa value of a protein-ionizable residue reflects its potency to donate a proton at a given pH value, which is essential for understanding a wide range of biological activity. Therefore, the accurate prediction of pKa values of protein residues is crucial for understanding enzymatic activity and protein-ligand binding, which are fundamental to drug discovery. Despite significant time and resources being invested to develop computational methods for protein residue pKa prediction, the accuracy of existing tools, such as the widely used PROPKA, remains limited. In this study, an integrated framework that fuses molecular dynamics simulations and deep learning models is proposed to improve the predictive accuracy of pKa values for ionizable residues. Specifically, we employ high-throughput molecular modeling using the AMOEBA polarized force field to construct a protein structure data set enriched with atomic electrostatics and other physics-inspired features. Using the experimentally determined pKa values from the PKAD-2 data set, we trained three graph-based neural network models. All three models demonstrated substantial improvements in prediction accuracy across four ionizable residue types, aspartic acid, glutamic acid, lysine, and histidine, when compared to PROPKA3.5.1, with the graph attention networks-based model exhibiting both high accuracy and strong generalizability when benchmarking against several recently published machine learning models. Beyond these improvements in predictive performance, feature importance analysis of the best-performing models revealed physically meaningful patterns of the descriptive features that aligned with the underlying biophysical principles governing protein residue pKa values, most notably, the complexity of the local microenvironment and the atomic geometric arrangement within the protein structure. Together, the trained pKa models and the curated dipole moment-enhanced data set based on a polarizable FF offer a valuable resource for the research community, with potential applications in early-stage drug target identification and protein engineering.

蛋白质可电离残基的pKa值反映了其在给定pH值下提供质子的能力，这对于理解广泛的生物活性是必不可少的。因此，准确预测蛋白质残基的pKa值对于理解酶活性和蛋白质与配体结合至关重要，这是药物发现的基础。尽管投入了大量的时间和资源来开发蛋白质残基pKa预测的计算方法，但现有工具（如广泛使用的PROPKA）的准确性仍然有限。在本研究中，提出了一个融合分子动力学模拟和深度学习模型的集成框架，以提高可电离残基pKa值的预测精度。具体来说，我们利用AMOEBA极化力场进行高通量分子建模，构建了一个富含原子静电和其他物理启发特征的蛋白质结构数据集。利用实验确定的PKAD-2数据集的pKa值，我们训练了三个基于图的神经网络模型。与PROPKA3.5.1相比，这三种模型在四种可电离残留物类型（天冬氨酸、谷氨酸、赖氨酸和组氨酸）的预测精度上都有了显著提高，基于图注意力网络的模型在与最近发表的几个机器学习模型进行基准测试时显示出高精度和强泛化性。除了预测性能的这些改进之外，对表现最佳的模型的特征重要性分析揭示了与控制蛋白质残基pKa值的潜在生物物理原理相一致的描述性特征的物理意义模式，最值得注意的是，局部微环境的复杂性和蛋白质结构中的原子几何排列。经过训练的pKa模型和基于极化FF的偶极矩增强数据集为研究界提供了宝贵的资源，在早期药物靶点识别和蛋白质工程中具有潜在的应用前景。

{"title":"Graph-Based Deep Learning Models for Predicting pKa Values of Protein-Ionizable Residues via Physically Inspired Feature Engineering.","authors":"Ziyu Song,Ruixuan Wang,Xun Jiao,Zuyi Huang","doi":"10.1021/acs.jcim.5c01681","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01681","url":null,"abstract":"The pKa value of a protein-ionizable residue reflects its potency to donate a proton at a given pH value, which is essential for understanding a wide range of biological activity. Therefore, the accurate prediction of pKa values of protein residues is crucial for understanding enzymatic activity and protein-ligand binding, which are fundamental to drug discovery. Despite significant time and resources being invested to develop computational methods for protein residue pKa prediction, the accuracy of existing tools, such as the widely used PROPKA, remains limited. In this study, an integrated framework that fuses molecular dynamics simulations and deep learning models is proposed to improve the predictive accuracy of pKa values for ionizable residues. Specifically, we employ high-throughput molecular modeling using the AMOEBA polarized force field to construct a protein structure data set enriched with atomic electrostatics and other physics-inspired features. Using the experimentally determined pKa values from the PKAD-2 data set, we trained three graph-based neural network models. All three models demonstrated substantial improvements in prediction accuracy across four ionizable residue types, aspartic acid, glutamic acid, lysine, and histidine, when compared to PROPKA3.5.1, with the graph attention networks-based model exhibiting both high accuracy and strong generalizability when benchmarking against several recently published machine learning models. Beyond these improvements in predictive performance, feature importance analysis of the best-performing models revealed physically meaningful patterns of the descriptive features that aligned with the underlying biophysical principles governing protein residue pKa values, most notably, the complexity of the local microenvironment and the atomic geometric arrangement within the protein structure. Together, the trained pKa models and the curated dipole moment-enhanced data set based on a polarizable FF offer a valuable resource for the research community, with potential applications in early-stage drug target identification and protein engineering.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"30 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

scACAN: An Adaptive Learning Framework Aggregating Local Graph Structure Context for Rare Cell Type Identification. scACAN：用于稀有细胞类型识别的自适应学习框架聚合局部图结构上下文。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-22 DOI: 10.1021/acs.jcim.5c02735

Shijia Yan,Junliang Shang,Shoujia Jiang,Xiaohan Zhang,Fanyu Zhang,Yan Sun,Jin-Xing Liu

Single-cell RNA sequencing (scRNA-seq) technology has become an essential tool for dissecting cellular heterogeneity and elucidating complex biological systems. Nevertheless, the uneven distribution of cell types and the limited representation of rare cell populations present substantial challenges for effective modeling and accurate identification. Most existing methods primarily focus on the annotation of abundant cell types, often overlooking rare, yet biologically significant subpopulations. In addition, the variability of cellular distributions across different biological contexts highlights the need for models with greater adaptability and a stronger capacity for contextual information integration. To overcome these challenges, we introduced scACAN, an adaptive graph construction framework that leverages aggregated local graph context information to design a positive sample selection strategy. By incorporating adaptive sampling and iterative optimization based on clustering results, scACAN effectively enhances the identification of both the major and rare cell types. Comprehensive experiments on multiple real-world scRNA-seq data sets demonstrate that scACAN achieves superior performance and reveals additional biologically meaningful rare cell subpopulations, providing a robust and generalizable solution for single-cell data analysis.

单细胞RNA测序（scRNA-seq）技术已成为解剖细胞异质性和阐明复杂生物系统的重要工具。然而，细胞类型的不均匀分布和罕见细胞群的有限代表性为有效建模和准确识别提出了实质性挑战。大多数现有的方法主要集中在丰富的细胞类型的注释，往往忽略了罕见的，但生物学上重要的亚群。此外，细胞分布在不同生物环境中的可变性突出了对具有更大适应性和更强环境信息整合能力的模型的需求。为了克服这些挑战，我们引入了scACAN，这是一个自适应图构建框架，它利用聚合的局部图上下文信息来设计积极的样本选择策略。通过结合自适应采样和基于聚类结果的迭代优化，scACAN有效地增强了对主要和稀有细胞类型的识别。在多个真实世界的scRNA-seq数据集上进行的综合实验表明，scACAN实现了卓越的性能，并揭示了其他具有生物学意义的罕见细胞亚群，为单细胞数据分析提供了一个强大且可推广的解决方案。

{"title":"scACAN: An Adaptive Learning Framework Aggregating Local Graph Structure Context for Rare Cell Type Identification.","authors":"Shijia Yan,Junliang Shang,Shoujia Jiang,Xiaohan Zhang,Fanyu Zhang,Yan Sun,Jin-Xing Liu","doi":"10.1021/acs.jcim.5c02735","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02735","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) technology has become an essential tool for dissecting cellular heterogeneity and elucidating complex biological systems. Nevertheless, the uneven distribution of cell types and the limited representation of rare cell populations present substantial challenges for effective modeling and accurate identification. Most existing methods primarily focus on the annotation of abundant cell types, often overlooking rare, yet biologically significant subpopulations. In addition, the variability of cellular distributions across different biological contexts highlights the need for models with greater adaptability and a stronger capacity for contextual information integration. To overcome these challenges, we introduced scACAN, an adaptive graph construction framework that leverages aggregated local graph context information to design a positive sample selection strategy. By incorporating adaptive sampling and iterative optimization based on clustering results, scACAN effectively enhances the identification of both the major and rare cell types. Comprehensive experiments on multiple real-world scRNA-seq data sets demonstrate that scACAN achieves superior performance and reveals additional biologically meaningful rare cell subpopulations, providing a robust and generalizable solution for single-cell data analysis.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"31 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding the Kinetic Mechanism of Ligands Stabilizing the RAS-CYPA Interaction. 了解配体稳定RAS-CYPA相互作用的动力学机制。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-22 DOI: 10.1021/acs.jcim.5c02966

Kexin Xu,Mingyun Shen,Zhe Wang,Sutong Xiang,Qirui Deng,Kaimo Yang,Zhiliang Jiang,Zihao Wang,Chen Yin,Tingjun Hou,Huiyong Sun

Molecular glues, including protein degraders and protein-protein interaction (PPI) stabilizers, have emerged as a new paradigm of drug design for regulating interactions between biomacromolecules; yet it is still a challenge for rational design of molecular glues. KRAS, as a prevalent oncogenic driver, is notoriously difficult to target by traditional small molecular drugs due to its challenging binding surface and frequent mutations. Although the small molecular drug RMC7977 has been designed as a PPI stabilizer for stabilizing the inherently weak RAS-CYPA interaction, the precise molecular mechanism underlying its stabilization effect and selectivity difference requires a deeper understanding. To this end, we leverage an integrated computational strategy combining molecular dynamics (MD) simulation, end-point binding free-energy calculation, and enhanced sampling technologies to elucidate the dynamic characteristics of RAS-ligand-CYPA interactions. Our result exhibits a high correlation between the predicted binding affinities and the experimental observations, demonstrating that RMC7977, acting as a strong PPI stabilizer, significantly enhances the stability of the KRAS-CYPA interaction, where, by delicately remodeling the protein-protein interface, the drug optimizes various interactions. Moreover, the results also uncover the dynamic process of stabilizer-mediated KRAS-CYPA stabilization and the mechanistic origin of the binding selectivity. This study provides essential molecular-level insights into RMC7977's function and offers a valuable computational framework for evaluating the stabilization effect of ligands targeting the KRAS-CYPA and other challenging PPI systems.

分子胶，包括蛋白质降解剂和蛋白质-蛋白质相互作用（PPI）稳定剂，已经成为调节生物大分子之间相互作用的药物设计的新范例；然而，分子胶的合理设计仍然是一个挑战。KRAS作为一种普遍存在的致癌驱动因子，由于其具有挑战性的结合表面和频繁的突变，传统的小分子药物难以靶向。虽然小分子药物RMC7977被设计为PPI稳定剂，以稳定RAS-CYPA固有的弱相互作用，但其稳定效果和选择性差异背后的精确分子机制需要更深入的了解。为此，我们利用结合分子动力学（MD）模拟、终点结合自由能计算和增强采样技术的集成计算策略来阐明ras -配体- cypa相互作用的动态特性。我们的研究结果显示，预测的结合亲和力与实验观察结果之间存在高度相关性，表明RMC7977作为一种强PPI稳定剂，显著增强了KRAS-CYPA相互作用的稳定性，其中，通过精细重塑蛋白质-蛋白质界面，药物优化了各种相互作用。此外，研究结果还揭示了稳定剂介导的KRAS-CYPA稳定的动态过程和结合选择性的机制起源。该研究为RMC7977的功能提供了重要的分子水平见解，并为评估靶向KRAS-CYPA和其他具有挑战性的PPI系统的配体的稳定效果提供了有价值的计算框架。

{"title":"Understanding the Kinetic Mechanism of Ligands Stabilizing the RAS-CYPA Interaction.","authors":"Kexin Xu,Mingyun Shen,Zhe Wang,Sutong Xiang,Qirui Deng,Kaimo Yang,Zhiliang Jiang,Zihao Wang,Chen Yin,Tingjun Hou,Huiyong Sun","doi":"10.1021/acs.jcim.5c02966","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02966","url":null,"abstract":"Molecular glues, including protein degraders and protein-protein interaction (PPI) stabilizers, have emerged as a new paradigm of drug design for regulating interactions between biomacromolecules; yet it is still a challenge for rational design of molecular glues. KRAS, as a prevalent oncogenic driver, is notoriously difficult to target by traditional small molecular drugs due to its challenging binding surface and frequent mutations. Although the small molecular drug RMC7977 has been designed as a PPI stabilizer for stabilizing the inherently weak RAS-CYPA interaction, the precise molecular mechanism underlying its stabilization effect and selectivity difference requires a deeper understanding. To this end, we leverage an integrated computational strategy combining molecular dynamics (MD) simulation, end-point binding free-energy calculation, and enhanced sampling technologies to elucidate the dynamic characteristics of RAS-ligand-CYPA interactions. Our result exhibits a high correlation between the predicted binding affinities and the experimental observations, demonstrating that RMC7977, acting as a strong PPI stabilizer, significantly enhances the stability of the KRAS-CYPA interaction, where, by delicately remodeling the protein-protein interface, the drug optimizes various interactions. Moreover, the results also uncover the dynamic process of stabilizer-mediated KRAS-CYPA stabilization and the mechanistic origin of the binding selectivity. This study provides essential molecular-level insights into RMC7977's function and offers a valuable computational framework for evaluating the stabilization effect of ligands targeting the KRAS-CYPA and other challenging PPI systems.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"263 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146021388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CompBind: Complex Guided Pretraining-Based Structure-Free Protein-Ligand Affinity Prediction. CompBind：基于复杂引导预训练的无结构蛋白配体亲和力预测。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-21 DOI: 10.1021/acs.jcim.5c02451

Duoyun Yi,Yanpeng Zhao,Huiyan Xu,Yixin Zhang,Mengxuan Wan,Peng Zan,Song He,Xiaochen Bo

Accurate prediction of protein-ligand binding affinity is essential in drug discovery. However, the limited availability and high cost of experimentally resolved protein-ligand complex structures significantly hinder the generalizability and broad applicability of current structure-based deep learning approaches. To address this challenge, we present CompBind, a novel framework for binding affinity prediction that leverages latent interaction patterns learned from existing complex structures while eliminating the need for 3D structural inputs during inference. Specifically, CompBind integrates bidirectional cross-attention with a dual-objective pretraining strategy, where contrastive learning enforces feature-space consistency between monomer pairs and their corresponding complex structures, while generative learning reconstructs interaction features to model the bidirectional mapping between monomeric and complex representations. This enables the model to infer binding representations directly from protein and ligand sequences alone. Across challenging affinity prediction scenarios, including cold-start and sparse-label conditions, CompBind not only outperforms noncomplex-based methods but also competitively rivals complex-based prediction approaches. In a drug repurposing case study targeting glutathione peroxidase 4 (GPX4), a clinically relevant but traditionally undruggable protein, CompBind successfully ranked known inhibitors among the top candidates. Furthermore, the built-in attention mechanism enhances model interpretability by identifying key binding residues. By decoupling predictive accuracy from the availability of experimental complex structures, CompBind offers a scalable, generalizable, and practical solution for accelerating drug discovery pipelines.

准确预测蛋白质与配体的结合亲和力在药物开发中至关重要。然而，实验解决的蛋白质-配体复合物结构的有限可用性和高成本严重阻碍了当前基于结构的深度学习方法的推广和广泛适用性。为了解决这一挑战，我们提出了CompBind，这是一个用于结合亲和预测的新框架，它利用了从现有复杂结构中学习到的潜在相互作用模式，同时消除了在推理过程中对3D结构输入的需要。具体而言，CompBind将双向交叉注意与双目标预训练策略相结合，其中对比学习增强了单体对与其对应的复杂结构之间的特征空间一致性，而生成学习重构交互特征以模拟单体和复杂表征之间的双向映射。这使得该模型能够直接从蛋白质和配体序列推断结合表征。在具有挑战性的亲和预测场景中，包括冷启动和稀疏标签条件，CompBind不仅优于非复杂的方法，而且与基于复杂的预测方法竞争。在一项针对谷胱甘肽过氧化物酶4 （GPX4）的药物再利用案例研究中，CompBind成功地将已知抑制剂列为最佳候选药物之一。此外，内置的注意机制通过识别关键的结合残基来增强模型的可解释性。通过将预测准确性与实验复杂结构的可用性解耦，CompBind为加速药物发现管道提供了可扩展、可推广和实用的解决方案。

{"title":"CompBind: Complex Guided Pretraining-Based Structure-Free Protein-Ligand Affinity Prediction.","authors":"Duoyun Yi,Yanpeng Zhao,Huiyan Xu,Yixin Zhang,Mengxuan Wan,Peng Zan,Song He,Xiaochen Bo","doi":"10.1021/acs.jcim.5c02451","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02451","url":null,"abstract":"Accurate prediction of protein-ligand binding affinity is essential in drug discovery. However, the limited availability and high cost of experimentally resolved protein-ligand complex structures significantly hinder the generalizability and broad applicability of current structure-based deep learning approaches. To address this challenge, we present CompBind, a novel framework for binding affinity prediction that leverages latent interaction patterns learned from existing complex structures while eliminating the need for 3D structural inputs during inference. Specifically, CompBind integrates bidirectional cross-attention with a dual-objective pretraining strategy, where contrastive learning enforces feature-space consistency between monomer pairs and their corresponding complex structures, while generative learning reconstructs interaction features to model the bidirectional mapping between monomeric and complex representations. This enables the model to infer binding representations directly from protein and ligand sequences alone. Across challenging affinity prediction scenarios, including cold-start and sparse-label conditions, CompBind not only outperforms noncomplex-based methods but also competitively rivals complex-based prediction approaches. In a drug repurposing case study targeting glutathione peroxidase 4 (GPX4), a clinically relevant but traditionally undruggable protein, CompBind successfully ranked known inhibitors among the top candidates. Furthermore, the built-in attention mechanism enhances model interpretability by identifying key binding residues. By decoupling predictive accuracy from the availability of experimental complex structures, CompBind offers a scalable, generalizable, and practical solution for accelerating drug discovery pipelines.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"29 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ToxFCDB: Toxicity Database for Forever Chemicals ToxFCDB：永久化学品毒性数据库

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-21 DOI: 10.1021/acs.jcim.5c01917

Meetali Sinha,Deepak Kumar Sachan,Joy Chakraborty,Anamta Ali,Anshika Gupta,Tanya Jamal,Ramakrishnan Parthasarathi

Per- and polyfluoroalkyl substances (PFAS)/forever chemicals are persistent synthetic chemicals with widespread use in a variety of consumer and industrial products. Some of these chemicals have undergone exhaustive research regarding experimental toxicity testing and human epidemiological inference; however, most compounds contain little or no information about their hazards or safety. ToxFCDB prioritizes these data-poor compounds for detailed toxicity investigations by constructing an effective web-based database for in silico preliminary evaluations employing more than 50 QSAR models/databases. The database compiles 8204 PFAS with their molecular structures, chemical classification, physicochemical and toxicokinetic properties, molecular descriptors, toxicological data, chemical genes, and human targets. This database aims to assist industrialists, policymakers, and researchers in assessing state-of-the-art data-centric information to make informed decisions to safeguard public health and the environment. In addition, the ToxFCDB could be a valuable tool for encouraging additional toxicological research in the domain of redesigning chemicals and polymers. The ToxFCDB is accessible online at http://ctf.iitr.res.in/toxfcdb/.

全氟和多氟烷基物质(PFAS)/永久化学品是广泛用于各种消费品和工业产品的持久性合成化学品。其中一些化学品在实验毒性测试和人类流行病学推断方面进行了详尽的研究；然而，大多数化合物很少或根本没有关于其危害或安全性的信息。ToxFCDB通过构建一个有效的基于网络的数据库，使用50多个QSAR模型/数据库进行计算机初步评估，从而优先考虑这些缺乏数据的化合物进行详细的毒性研究。该数据库收录了8204种PFAS及其分子结构、化学分类、理化和毒性动力学特性、分子描述符、毒理学数据、化学基因和人体靶标。该数据库旨在帮助实业家、政策制定者和研究人员评估以数据为中心的最新信息，以便做出明智的决定，以保障公众健康和环境。此外，ToxFCDB可以成为鼓励在重新设计化学品和聚合物领域进行更多毒理学研究的有价值的工具。ToxFCDB可以在http://ctf.iitr.res.in/toxfcdb/上在线访问。

{"title":"ToxFCDB: Toxicity Database for Forever Chemicals","authors":"Meetali Sinha,Deepak Kumar Sachan,Joy Chakraborty,Anamta Ali,Anshika Gupta,Tanya Jamal,Ramakrishnan Parthasarathi","doi":"10.1021/acs.jcim.5c01917","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01917","url":null,"abstract":"Per- and polyfluoroalkyl substances (PFAS)/forever chemicals are persistent synthetic chemicals with widespread use in a variety of consumer and industrial products. Some of these chemicals have undergone exhaustive research regarding experimental toxicity testing and human epidemiological inference; however, most compounds contain little or no information about their hazards or safety. ToxFCDB prioritizes these data-poor compounds for detailed toxicity investigations by constructing an effective web-based database for in silico preliminary evaluations employing more than 50 QSAR models/databases. The database compiles 8204 PFAS with their molecular structures, chemical classification, physicochemical and toxicokinetic properties, molecular descriptors, toxicological data, chemical genes, and human targets. This database aims to assist industrialists, policymakers, and researchers in assessing state-of-the-art data-centric information to make informed decisions to safeguard public health and the environment. In addition, the ToxFCDB could be a valuable tool for encouraging additional toxicological research in the domain of redesigning chemicals and polymers. The ToxFCDB is accessible online at http://ctf.iitr.res.in/toxfcdb/.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"38 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepDBPI: DNA-Binding Protein Identifier Using a Deep Learning Model with Transformed Denoised Features DeepDBPI: dna结合蛋白标识符使用深度学习模型与转换去噪特征

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-21 DOI: 10.1021/acs.jcim.5c02637

Kamran Arshad,Muhammad Arif,Dong-Jun Yu

Motivation: DNA-binding proteins (DBPs) play a significant role in the entire biological system. Many DNA-related studies actively investigate to understand whether a protein binds to DNA. Conventionally, wet-lab experiments are conducted to characterize DBP functions. However, these methods are often expensive and time-intensive. With the rapid advancement of bioinformatics, there is a growing demand for efficient computational protocols to predict DBPs. Several sequence-based computational tools have been designed to predict DBPs; however, research gaps persist for further improvement. Method: We developed a novel deep learning (DL)-based predictor, called DeepDBPI, for enhancing DBP prediction. The proposed DeepDBPI model leverages the evolutionary and graphical-based properties of protein sequences using novel descriptors, namely covariance correlation-based position-specific scoring matrix (CC-PSSM), binary-profile-based (BP-PSSM), Trigram (TRG-PSSM), and feature encoding based on graphical and statistical (FEGS) methods. Then, we applied the wavelet denoising (WD) algorithm to remove the noise from sequence-derived features. We fed the filtered features to ResNet, LSTM, BiLSTM, RNN, BiRNN, and BiGRU. Results: The DeepDBPI model achieved the best prediction performance with Bi-GRU using the denoised-based FEGS encoding method under 5-fold cross-validation, evaluated by ACC, SN, SP, and MCC. Our proposed model achieved 92.13% ACC, 93.07% SN, 91.19% SP, and 0.8427 MCC on the independent test. We believe the effectiveness of the developed bioinformatics protocol provides insights for drug discovery and other proteomic problems. All data, including the dataset, feature extraction techniques, and models, are available at: https://doi.org/10.5281/zenodo.17496063

动机：dna结合蛋白（DBPs）在整个生物系统中发挥着重要作用。许多与DNA相关的研究都在积极地研究蛋白质是否与DNA结合。通常，湿室实验是用来表征DBP功能的。然而，这些方法通常是昂贵和耗时的。随着生物信息学的快速发展，对预测dbp的高效计算方案的需求日益增长。已经设计了几种基于序列的计算工具来预测dbp；然而，研究差距仍然存在，需要进一步改进。方法：我们开发了一种新的基于深度学习（DL）的预测器，称为DeepDBPI，用于增强DBP预测。提出的DeepDBPI模型利用新的描述符，即基于协方差相关的位置特异性评分矩阵（CC-PSSM）、基于二进制谱（BP-PSSM）、Trigram （TRG-PSSM）和基于图形和统计（FEGS）方法的特征编码，利用蛋白质序列的进化和基于图形的特性。然后，我们应用小波去噪（WD）算法去除序列衍生特征中的噪声。我们将过滤后的特征输入到ResNet、LSTM、BiLSTM、RNN、BiRNN和BiGRU中。结果：采用基于去噪的FEGS编码方法，在5次交叉验证下，通过ACC、SN、SP和MCC评价，DeepDBPI模型在Bi-GRU中获得了最佳的预测性能。我们提出的模型在独立测试中达到了92.13%的ACC， 93.07%的SN， 91.19%的SP和0.8427的MCC。我们相信开发的生物信息学协议的有效性为药物发现和其他蛋白质组学问题提供了见解。所有数据，包括数据集、特征提取技术和模型，可在https://doi.org/10.5281/zenodo.17496063上获得

{"title":"DeepDBPI: DNA-Binding Protein Identifier Using a Deep Learning Model with Transformed Denoised Features","authors":"Kamran Arshad,Muhammad Arif,Dong-Jun Yu","doi":"10.1021/acs.jcim.5c02637","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02637","url":null,"abstract":"Motivation: DNA-binding proteins (DBPs) play a significant role in the entire biological system. Many DNA-related studies actively investigate to understand whether a protein binds to DNA. Conventionally, wet-lab experiments are conducted to characterize DBP functions. However, these methods are often expensive and time-intensive. With the rapid advancement of bioinformatics, there is a growing demand for efficient computational protocols to predict DBPs. Several sequence-based computational tools have been designed to predict DBPs; however, research gaps persist for further improvement. Method: We developed a novel deep learning (DL)-based predictor, called DeepDBPI, for enhancing DBP prediction. The proposed DeepDBPI model leverages the evolutionary and graphical-based properties of protein sequences using novel descriptors, namely covariance correlation-based position-specific scoring matrix (CC-PSSM), binary-profile-based (BP-PSSM), Trigram (TRG-PSSM), and feature encoding based on graphical and statistical (FEGS) methods. Then, we applied the wavelet denoising (WD) algorithm to remove the noise from sequence-derived features. We fed the filtered features to ResNet, LSTM, BiLSTM, RNN, BiRNN, and BiGRU. Results: The DeepDBPI model achieved the best prediction performance with Bi-GRU using the denoised-based FEGS encoding method under 5-fold cross-validation, evaluated by ACC, SN, SP, and MCC. Our proposed model achieved 92.13% ACC, 93.07% SN, 91.19% SP, and 0.8427 MCC on the independent test. We believe the effectiveness of the developed bioinformatics protocol provides insights for drug discovery and other proteomic problems. All data, including the dataset, feature extraction techniques, and models, are available at: https://doi.org/10.5281/zenodo.17496063","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"32 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CACHE Challenge #3: Targeting the Nsp3 Macrodomain of SARS-CoV-2 CACHE挑战#3：靶向SARS-CoV-2的Nsp3大域

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-21 DOI: 10.1021/acs.jcim.5c02441

Oleksandra Herasymenko,Madhushika Silva,Galen J. Correy,Abd Al-Aziz A. Abu-Saleh,Suzanne Ackloo,Cheryl Arrowsmith,Alan Ashworth,Fuqiang Ban,Hartmut Beck,Kevin P. Bishop,Hugo J. Bohórquez,Albina Bolotokova,Marko Breznik,Irene Chau,Yu Chen,Artem Cherkasov,Wim Dehaen,Dennis Della Corte,Katrin Denzinger,Niklas P. Doering,Kristina Edfeldt,Aled Edwards,Darren Fayne,Francesco Gentile,Elisa Gibson,Ozan Gokdemir,Anders Gunnarsson,Judith Günther,John J. Irwin,Jan Halborg Jensen,Rachel J. Harding,Alexander Hillisch,Laurent Hoffer,Anders Hogner,Ashley Hutchinson,Shubhangi Kandwal,Andrea Karlova,Kushal Koirala,Sergei Kotelnikov,Dima Kozakov,Juyong Lee,Soowon Lee,Uta Lessel,Sijie Liu,Xuefeng Liu,Peter Loppnau,Jens Meiler,Rocco Moretti,Yurii S. Moroz,Charuvaka Muvva,Tudor I. Oprea,Brooks Paige,Amit Pandit,Keunwan Park,Gennady Poda,Mykola V. Protopopov,Vera Pütter,Rahul Ravichandran,Didier Rognan,Edina Rosta,Yogesh Sabnis,Thomas Scott,Almagul Seitova,Purshotam Sharma,François Sindt,Minghu Song,Casper Steinmann,Rick Stevens,Valerij Talagayev,Valentyna V. Tararina,Olga Tarkhanova,Damon Tingey,John F. Trant,Dakota Treleaven,Alexander Tropsha,Patrick Walters,Jude Wells,Yvonne Westermaier,Gerhard Wolber,Lars Wortmann,Shuangjia Zheng,James S. Fraser,Matthieu Schapira

The third Critical Assessment of Computational Hit-finding Experiments (CACHE) challenged computational teams to identify chemically novel ligands targeting the macrodomain 1 of SARS-CoV-2 Nsp3, a promising coronavirus drug target. Twenty-three groups deployed diverse design strategies to collectively select 1739 ligand candidates. While over 85% of the designed molecules were chemically novel, the best experimentally confirmed hits were structurally similar to previously published compounds. Confirming a trend observed in CACHE #1 and #2, two of the best-performing workflows used compounds selected by physics-based computational screening methods to train machine learning models able to rapidly screen large chemical libraries, while four others used exclusively physics-based approaches. Three pharmacophore searches and one fragment growing strategy were also part of the seven winning workflows. While active molecules discovered by CACHE #3 participants largely mimicked the adenine ring of the endogenous substrate, ADP-ribose, preserving the canonical chemotype commonly observed in previously reported Nsp3-Mac1 ligands, they still provide novel structure–activity relationship insights that may inform the development of future antivirals. Collectively, these results show that multiple molecular design strategies can efficiently converge on similar potent molecules.

第三次计算命中发现实验关键评估（CACHE）向计算团队提出了挑战，要求他们确定针对SARS-CoV-2 Nsp3大结构域1的化学新配体，这是一种有前景的冠状病毒药物靶点。23个小组采用不同的设计策略，共同选择了1739个候选配体。虽然超过85%的设计分子在化学上是新颖的，但实验证实的最佳命中与先前发表的化合物结构相似。证实了在CACHE #1和#2中观察到的趋势，两个性能最好的工作流程使用基于物理的计算筛选方法选择的化合物来训练能够快速筛选大型化学文库的机器学习模型，而其他四个则专门使用基于物理的方法。三个药效团搜索和一个片段增长策略也是七个获奖工作流程的一部分。虽然CACHE #3参与者发现的活性分子在很大程度上模仿了内源性底物adp核糖的腺嘌呤环，保留了在先前报道的Nsp3-Mac1配体中常见的典型化学型，但它们仍然提供了新的结构-活性关系见解，可能为未来抗病毒药物的开发提供信息。总的来说，这些结果表明，多种分子设计策略可以有效地收敛于相似的有效分子。

{"title":"CACHE Challenge #3: Targeting the Nsp3 Macrodomain of SARS-CoV-2","authors":"Oleksandra Herasymenko,Madhushika Silva,Galen J. Correy,Abd Al-Aziz A. Abu-Saleh,Suzanne Ackloo,Cheryl Arrowsmith,Alan Ashworth,Fuqiang Ban,Hartmut Beck,Kevin P. Bishop,Hugo J. Bohórquez,Albina Bolotokova,Marko Breznik,Irene Chau,Yu Chen,Artem Cherkasov,Wim Dehaen,Dennis Della Corte,Katrin Denzinger,Niklas P. Doering,Kristina Edfeldt,Aled Edwards,Darren Fayne,Francesco Gentile,Elisa Gibson,Ozan Gokdemir,Anders Gunnarsson,Judith Günther,John J. Irwin,Jan Halborg Jensen,Rachel J. Harding,Alexander Hillisch,Laurent Hoffer,Anders Hogner,Ashley Hutchinson,Shubhangi Kandwal,Andrea Karlova,Kushal Koirala,Sergei Kotelnikov,Dima Kozakov,Juyong Lee,Soowon Lee,Uta Lessel,Sijie Liu,Xuefeng Liu,Peter Loppnau,Jens Meiler,Rocco Moretti,Yurii S. Moroz,Charuvaka Muvva,Tudor I. Oprea,Brooks Paige,Amit Pandit,Keunwan Park,Gennady Poda,Mykola V. Protopopov,Vera Pütter,Rahul Ravichandran,Didier Rognan,Edina Rosta,Yogesh Sabnis,Thomas Scott,Almagul Seitova,Purshotam Sharma,François Sindt,Minghu Song,Casper Steinmann,Rick Stevens,Valerij Talagayev,Valentyna V. Tararina,Olga Tarkhanova,Damon Tingey,John F. Trant,Dakota Treleaven,Alexander Tropsha,Patrick Walters,Jude Wells,Yvonne Westermaier,Gerhard Wolber,Lars Wortmann,Shuangjia Zheng,James S. Fraser,Matthieu Schapira","doi":"10.1021/acs.jcim.5c02441","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02441","url":null,"abstract":"The third Critical Assessment of Computational Hit-finding Experiments (CACHE) challenged computational teams to identify chemically novel ligands targeting the macrodomain 1 of SARS-CoV-2 Nsp3, a promising coronavirus drug target. Twenty-three groups deployed diverse design strategies to collectively select 1739 ligand candidates. While over 85% of the designed molecules were chemically novel, the best experimentally confirmed hits were structurally similar to previously published compounds. Confirming a trend observed in CACHE #1 and #2, two of the best-performing workflows used compounds selected by physics-based computational screening methods to train machine learning models able to rapidly screen large chemical libraries, while four others used exclusively physics-based approaches. Three pharmacophore searches and one fragment growing strategy were also part of the seven winning workflows. While active molecules discovered by CACHE #3 participants largely mimicked the adenine ring of the endogenous substrate, ADP-ribose, preserving the canonical chemotype commonly observed in previously reported Nsp3-Mac1 ligands, they still provide novel structure–activity relationship insights that may inform the development of future antivirals. Collectively, these results show that multiple molecular design strategies can efficiently converge on similar potent molecules.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"6 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146006342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ab-SELDON: Leveraging Diversity Data for an Efficient Automated Computational Pipeline for Antibody Design. Ab-SELDON：利用多样性数据为抗体设计提供高效的自动化计算管道。

IF 5.6 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling

Pub Date : 2026-01-20 DOI: 10.1021/acs.jcim.5c01924

Jean V Sampaio,Andrielly H S Costa,Aline O Albuquerque,Júlia S Souza,Diego S Almeida,Eduardo M Gaieta,Matheus V Almeida,Geraldo R Sartori,João H M Silva

The utilization of predictive tools has become increasingly prevalent in the development of biopharmaceuticals, reducing the time and cost of research. However, most methods for computational antibody design are hampered by their reliance on scarcely available antibody structures, potential for immunogenic modifications, and a restricted exploration of the paratope's potential chemical and conformational space. We propose Ab-SELDON, a modular and easily customizable antibody design pipeline capable of iteratively optimizing an antibody-antigen (Ab-Ag) interaction in five different modification steps, including CDR and framework grafting, and mutagenesis. The optimization process is guided by diversity data collected from millions of publicly available human antibody sequences. This approach enhanced the exploration of the chemical and conformational space of the paratope during computational tests involving the optimization of an anti-HER2 antibody. Optimization of another antibody against Gal-3BP stabilized the Ab-Ag interaction in molecular dynamics simulations at lower runtime than alternative pipelines. Tests with SKEMPI's Ab-Ag mutations also demonstrated the pipeline's ability to correctly identify the effect of the majority of mutations, especially multipoint and those that increased binding affinity. This freely available pipeline presents a new approach for computationally efficient and automated in silico antibody design, thereby facilitating the development of new biopharmaceuticals.

预测工具的使用在生物制药的开发中变得越来越普遍，减少了研究的时间和成本。然而，大多数计算抗体设计的方法都受到它们依赖于很少可用的抗体结构、免疫原性修饰的潜力以及对paratope潜在化学和构象空间的有限探索的阻碍。我们提出了Ab-SELDON，这是一个模块化且易于定制的抗体设计管道，能够在五个不同的修饰步骤中迭代优化抗体-抗原（Ab-Ag）相互作用，包括CDR和框架移植以及诱变。优化过程由从数百万公开可用的人类抗体序列中收集的多样性数据指导。这种方法在涉及抗her2抗体优化的计算测试中增强了对paratech的化学和构象空间的探索。优化另一种针对Gal-3BP的抗体在分子动力学模拟中稳定了Ab-Ag相互作用，比其他管道运行时间更短。对SKEMPI的Ab-Ag突变的测试也证明了该管道能够正确识别大多数突变的影响，特别是多点突变和增加结合亲和力的突变。这种免费提供的管道为计算效率和自动化的硅抗体设计提供了一种新的方法，从而促进了新的生物制药的发展。

{"title":"Ab-SELDON: Leveraging Diversity Data for an Efficient Automated Computational Pipeline for Antibody Design.","authors":"Jean V Sampaio,Andrielly H S Costa,Aline O Albuquerque,Júlia S Souza,Diego S Almeida,Eduardo M Gaieta,Matheus V Almeida,Geraldo R Sartori,João H M Silva","doi":"10.1021/acs.jcim.5c01924","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01924","url":null,"abstract":"The utilization of predictive tools has become increasingly prevalent in the development of biopharmaceuticals, reducing the time and cost of research. However, most methods for computational antibody design are hampered by their reliance on scarcely available antibody structures, potential for immunogenic modifications, and a restricted exploration of the paratope's potential chemical and conformational space. We propose Ab-SELDON, a modular and easily customizable antibody design pipeline capable of iteratively optimizing an antibody-antigen (Ab-Ag) interaction in five different modification steps, including CDR and framework grafting, and mutagenesis. The optimization process is guided by diversity data collected from millions of publicly available human antibody sequences. This approach enhanced the exploration of the chemical and conformational space of the paratope during computational tests involving the optimization of an anti-HER2 antibody. Optimization of another antibody against Gal-3BP stabilized the Ab-Ag interaction in molecular dynamics simulations at lower runtime than alternative pipelines. Tests with SKEMPI's Ab-Ag mutations also demonstrated the pipeline's ability to correctly identify the effect of the majority of mutations, especially multipoint and those that increased binding affinity. This freely available pipeline presents a new approach for computationally efficient and automated in silico antibody design, thereby facilitating the development of new biopharmaceuticals.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"30 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146005047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0