首页 > 最新文献

Journal of Chemical Information and Modeling 最新文献

英文 中文
Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts 数据移位下分子机器学习属性预测的不确定性量化。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02381
Raquel Parrondo-Pizarro, , , Jessica Lanini, , and , Raquel Rodríguez-Pérez*, 

Drug discovery and medicinal chemistry efforts are increasingly influenced by machine learning (ML), with compound property prediction as a central application. ML models have demonstrated strong performance in predicting various compound properties from chemical structure. However, these models can exhibit varying levels of prediction error, making uncertainty quantification (UQ) essential for informed decisions. Standard UQ metrics include the distance to the molecules in the training set and prediction variance, obtained through methods such as model ensembles or Bayesian modeling. Although several UQ methodologies have been developed in recent years, no single approach consistently outperformed others. Herein, we present a comprehensive benchmark of UQ strategies for ML-based prediction of absorption, distribution, metabolism, and excretion (ADME) properties, using both in-house and public data sets. We employed the recently introduced UNIQUE (UNcertaInty QUantification bEnchmarking) framework and evaluated UQ method performance under data shifts. Our findings indicate data-based UQ metrics (e.g., chemical distance), and model-based UQ metrics (e.g., predicted value and variance) may capture complementary aspects of uncertainty. Their combination through error models, designed to predict the original ML model’s error, yielded higher-quality uncertainty estimates. These error models emerged as a promising strategy for enhancing UQ, showing robustness in under various degrees and types of data shift. Taken together, our work highlights the potential of combining diverse UQ metrics and error modeling to improve reliability in molecular property prediction. By establishing standardized evaluation setups and assessing UQ under data shifts, we provide a foundation for future UQ method development and benchmarking in the field.

药物发现和药物化学工作越来越多地受到机器学习(ML)的影响,其中化合物性质预测是中心应用。ML模型在从化学结构预测各种化合物性质方面表现出很强的性能。然而,这些模型可能表现出不同程度的预测误差,使得不确定性量化(UQ)对于明智的决策至关重要。标准UQ指标包括通过模型集成或贝叶斯建模等方法获得的与训练集中分子的距离和预测方差。虽然近年来开发了几种UQ方法,但没有一种方法始终优于其他方法。在此,我们使用内部和公共数据集,提出了基于ml的吸收、分布、代谢和排泄(ADME)特性预测的UQ策略的综合基准。我们采用了最近引入的UNIQUE(不确定性量化基准)框架,并评估了数据移位下UQ方法的性能。我们的研究结果表明,基于数据的UQ度量(例如,化学距离)和基于模型的UQ度量(例如,预测值和方差)可以捕获不确定性的互补方面。他们通过误差模型的组合,旨在预测原始机器学习模型的误差,产生更高质量的不确定性估计。这些误差模型是一种很有前途的提高UQ的策略,在不同程度和类型的数据移位下显示出鲁棒性。综上所述,我们的工作突出了结合不同UQ指标和误差建模来提高分子性质预测可靠性的潜力。通过建立标准化的评估设置和评估数据变化下的UQ,我们为未来UQ方法的开发和该领域的基准测试奠定了基础。
{"title":"Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts","authors":"Raquel Parrondo-Pizarro,&nbsp;, ,&nbsp;Jessica Lanini,&nbsp;, and ,&nbsp;Raquel Rodríguez-Pérez*,&nbsp;","doi":"10.1021/acs.jcim.5c02381","DOIUrl":"10.1021/acs.jcim.5c02381","url":null,"abstract":"<p >Drug discovery and medicinal chemistry efforts are increasingly influenced by machine learning (ML), with compound property prediction as a central application. ML models have demonstrated strong performance in predicting various compound properties from chemical structure. However, these models can exhibit varying levels of prediction error, making uncertainty quantification (UQ) essential for informed decisions. Standard UQ metrics include the distance to the molecules in the training set and prediction variance, obtained through methods such as model ensembles or Bayesian modeling. Although several UQ methodologies have been developed in recent years, no single approach consistently outperformed others. Herein, we present a comprehensive benchmark of UQ strategies for ML-based prediction of absorption, distribution, metabolism, and excretion (ADME) properties, using both in-house and public data sets. We employed the recently introduced UNIQUE (UNcertaInty QUantification bEnchmarking) framework and evaluated UQ method performance under data shifts. Our findings indicate data-based UQ metrics (e.g., chemical distance), and model-based UQ metrics (e.g., predicted value and variance) may capture complementary aspects of uncertainty. Their combination through error models, designed to predict the original ML model’s error, yielded higher-quality uncertainty estimates. These error models emerged as a promising strategy for enhancing UQ, showing robustness in under various degrees and types of data shift. Taken together, our work highlights the potential of combining diverse UQ metrics and error modeling to improve reliability in molecular property prediction. By establishing standardized evaluation setups and assessing UQ under data shifts, we provide a foundation for future UQ method development and benchmarking in the field.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"923–935"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02381","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovery of a Covalent Small-Molecule eEF1A1 Inhibitor via Structure-Based Virtual Screening 基于结构的虚拟筛选发现共价小分子eEF1A1抑制剂。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-13 DOI: 10.1021/acs.jcim.5c02496
Yangping Deng, , , Sizheng Li, , , Liang Wang, , , Jianping Lin, , , Haohao Fu, , , Jing Li*, , and , Yue Chen*, 

Pancreatic cancer remains a formidable health challenge due to its late-stage diagnosis and limited therapeutic options, underscoring the need for novel targets and modalities. Our previous work revealed that the natural product, BE-43547A2, could effectively inhibit the progression of pancreatic cancer by the covalent binding to eukaryotic translation elongation factor 1 α 1 (eEF1A1) at Cys234 (C234). Considering the critical role in protein synthesis and the association with pancreatic cancer progression, eEF1A1 is a novel promising target for pancreatic cancer. However, the rational drug design methods for eEF1A1 are extremely lacking. Herein, using microsecond-scale molecular dynamics (MD) simulations, we identify a suitable eEF1A1 conformation for structure-based virtual screening (SBVS) by targeting the residue of C234. Through a tailored SBVS pipeline, we identified AKOS-04 as a novel small-molecule covalent inhibitor with nanomolar-level potency (IC50 = 28.5 ± 2.86 nM in the PATU8988T cell line). Notably, cellular thermal shift assays (CETSA), with the treatments of dithiothreitol (DTT) and iodoacetamide (IAM), confirmed the covalent Cys-involved interaction of AKOS-04 and eEF1A1. Further structural modification validated the critical contribution of a double bond in the acrylamide group of AKOS-04 for its covalent binding with eEF1A1, manifested by the abolished inhibitory activity of compound 9 with the changed single bond in the acrylamide group. MST experiments confirmed direct binding of the compounds to eEF1A1 protein. AKOS-04 exhibited the strongest binding among the tested compounds, consistent with effective covalent target. Finally, MD simulations and pair-interaction energy analyses highlighted Lys84, Arg218, and Glu230 of eEF1A1 as key residues for driving its binding interactions to AKOS-04. These results reveal that AKOS-04, screened by SBVS against C234 of eEF1A1, represents a promising lead for eEF1A1-targeted pancreatic cancer therapy, highlighting the power of computational approaches in covalent drug discovery.

由于胰腺癌的晚期诊断和有限的治疗选择,胰腺癌仍然是一个巨大的健康挑战,强调需要新的靶点和模式。我们之前的工作表明,天然产物BE-43547A2可以通过在Cys234 (C234)位点与真核翻译延伸因子1 α 1 (eEF1A1)共价结合,有效抑制胰腺癌的进展。考虑到eEF1A1在蛋白质合成中的关键作用以及与胰腺癌进展的关联,eEF1A1是一个新的有希望的胰腺癌靶点。然而,合理的eEF1A1药物设计方法却极为缺乏。本文利用微秒级分子动力学(MD)模拟,通过靶向C234残基,确定了适合用于基于结构的虚拟筛选(SBVS)的eEF1A1构象。通过量身定制的SBVS管道,我们鉴定出AKOS-04是一种具有纳米级效价的新型小分子共价抑制剂(在PATU8988T细胞系中的IC50 = 28.5±2.86 nM)。值得注意的是,在二硫苏糖醇(DTT)和碘乙酰胺(IAM)处理下,细胞热移试验(CETSA)证实了AKOS-04和eEF1A1共价cys参与的相互作用。进一步的结构修饰验证了AKOS-04的丙烯酰胺基团双键对其与eEF1A1共价结合的关键作用,表现为化合物9的抑制活性随着丙烯酰胺基团单键的改变而消失。MST实验证实了化合物与eEF1A1蛋白的直接结合。AKOS-04在被试化合物中表现出最强的结合,与有效共价靶标一致。最后,MD模拟和对相互作用能分析表明,eEF1A1的Lys84、Arg218和Glu230是驱动其与AKOS-04结合相互作用的关键残基。这些结果表明,SBVS筛选的针对eEF1A1的C234的AKOS-04代表了eEF1A1靶向胰腺癌治疗的有希望的先导,突出了计算方法在共价药物发现中的力量。
{"title":"Discovery of a Covalent Small-Molecule eEF1A1 Inhibitor via Structure-Based Virtual Screening","authors":"Yangping Deng,&nbsp;, ,&nbsp;Sizheng Li,&nbsp;, ,&nbsp;Liang Wang,&nbsp;, ,&nbsp;Jianping Lin,&nbsp;, ,&nbsp;Haohao Fu,&nbsp;, ,&nbsp;Jing Li*,&nbsp;, and ,&nbsp;Yue Chen*,&nbsp;","doi":"10.1021/acs.jcim.5c02496","DOIUrl":"10.1021/acs.jcim.5c02496","url":null,"abstract":"<p >Pancreatic cancer remains a formidable health challenge due to its late-stage diagnosis and limited therapeutic options, underscoring the need for novel targets and modalities. Our previous work revealed that the natural product, BE-43547A<sub>2</sub>, could effectively inhibit the progression of pancreatic cancer by the covalent binding to eukaryotic translation elongation factor 1 α 1 (eEF1A1) at Cys234 (C234). Considering the critical role in protein synthesis and the association with pancreatic cancer progression, eEF1A1 is a novel promising target for pancreatic cancer. However, the rational drug design methods for eEF1A1 are extremely lacking. Herein, using microsecond-scale molecular dynamics (MD) simulations, we identify a suitable eEF1A1 conformation for structure-based virtual screening (SBVS) by targeting the residue of C234. Through a tailored SBVS pipeline, we identified AKOS-04 as a novel small-molecule covalent inhibitor with nanomolar-level potency (IC<sub>50</sub> = 28.5 ± 2.86 nM in the PATU8988T cell line). Notably, cellular thermal shift assays (CETSA), with the treatments of dithiothreitol (DTT) and iodoacetamide (IAM), confirmed the covalent Cys-involved interaction of AKOS-04 and eEF1A1. Further structural modification validated the critical contribution of a double bond in the acrylamide group of AKOS-04 for its covalent binding with eEF1A1, manifested by the abolished inhibitory activity of compound <b>9</b> with the changed single bond in the acrylamide group. MST experiments confirmed direct binding of the compounds to eEF1A1 protein. AKOS-04 exhibited the strongest binding among the tested compounds, consistent with effective covalent target. Finally, MD simulations and pair-interaction energy analyses highlighted Lys84, Arg218, and Glu230 of eEF1A1 as key residues for driving its binding interactions to AKOS-04. These results reveal that AKOS-04, screened by SBVS against C234 of eEF1A1, represents a promising lead for eEF1A1-targeted pancreatic cancer therapy, highlighting the power of computational approaches in covalent drug discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1083–1096"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PepFoundry: A Pipeline for Building Machine-Learning Ready Representations of Nonstandard Peptides Containing Cycles, Non-natural Residues, Polymer Units, and More PepFoundry:一个用于构建机器学习准备表示的管道,包含循环,非天然残基,聚合物单元等的非标准肽。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-13 DOI: 10.1021/acs.jcim.5c02629
Daniel Garzon Otero, , , Omid Akbari, , , Aneesh Mandapati, , and , Camille Bilodeau*, 

Peptides featuring synthetic modifications, such as noncanonical amino acids, backbone modifications, cyclic structures, and polymer units have become central to modern drug design due to their enhanced stability and functional diversity. However, current machine learning (ML) approaches are restricted by challenges associated with transforming peptide sequences into atom-level representations, leading ML efforts to focus largely on datasets containing linear peptides comprised of standard residues. Here, we present PepFoundry, a Python package that handles peptide sequences beyond canonical amino acids and linear topologies by using SMILES strings in the CHUCKLES format. PepFoundry generates atom-mapped RDKit molecule objects, enabling the extraction of atom-level features, such as Morgan fingerprints and graph representations. We demonstrate its utility by processing a dataset of peptide sequences containing noncanonical amino acids and generating atomic level features for downstream property prediction. We show that atomic-level representations of peptides containing noncanonical amino acids consistently outperform sequence-level representations, regardless of model type. We additionally explore the representation of noncanonical peptides through latent space visualization and show that models with atomic-level information can effectively learn relationships between analogous sequences of l-peptides, d-peptides, and peptoids. This framework allows for the flexible incorporation of new amino acid chemistries, enabling existing ML methods to be straightforwardly applied to datasets of peptides containing nonstandard features. It also facilitates the rapid construction of customized peptide libraries and provides a scalable platform to accelerate ML-driven peptide discovery and optimization.

具有合成修饰的肽,如非规范氨基酸、主链修饰、环结构和聚合物单元,由于其增强的稳定性和功能多样性,已成为现代药物设计的核心。然而,当前的机器学习(ML)方法受到将肽序列转化为原子级表示的挑战的限制,导致ML的努力主要集中在包含由标准残基组成的线性肽的数据集上。在这里,我们介绍PepFoundry,这是一个Python包,通过使用CHUCKLES格式的SMILES字符串来处理超越规范氨基酸和线性拓扑的肽序列。PepFoundry生成原子映射的RDKit分子对象,支持原子级特征的提取,如摩根指纹和图形表示。我们通过处理包含非规范氨基酸的肽序列数据集并生成用于下游性质预测的原子水平特征来证明其实用性。我们表明,无论模型类型如何,包含非规范氨基酸的肽的原子水平表示始终优于序列水平表示。此外,我们还通过潜在空间可视化探索了非规范肽的表示,并表明具有原子水平信息的模型可以有效地学习l-肽、d-肽和类肽类似序列之间的关系。该框架允许灵活地结合新的氨基酸化学物质,使现有的ML方法能够直接应用于含有非标准特征的肽的数据集。它还促进了定制肽库的快速构建,并提供了一个可扩展的平台来加速机器学习驱动的肽发现和优化。
{"title":"PepFoundry: A Pipeline for Building Machine-Learning Ready Representations of Nonstandard Peptides Containing Cycles, Non-natural Residues, Polymer Units, and More","authors":"Daniel Garzon Otero,&nbsp;, ,&nbsp;Omid Akbari,&nbsp;, ,&nbsp;Aneesh Mandapati,&nbsp;, and ,&nbsp;Camille Bilodeau*,&nbsp;","doi":"10.1021/acs.jcim.5c02629","DOIUrl":"10.1021/acs.jcim.5c02629","url":null,"abstract":"<p >Peptides featuring synthetic modifications, such as noncanonical amino acids, backbone modifications, cyclic structures, and polymer units have become central to modern drug design due to their enhanced stability and functional diversity. However, current machine learning (ML) approaches are restricted by challenges associated with transforming peptide sequences into atom-level representations, leading ML efforts to focus largely on datasets containing linear peptides comprised of standard residues. Here, we present PepFoundry, a Python package that handles peptide sequences beyond canonical amino acids and linear topologies by using SMILES strings in the CHUCKLES format. PepFoundry generates atom-mapped RDKit molecule objects, enabling the extraction of atom-level features, such as Morgan fingerprints and graph representations. We demonstrate its utility by processing a dataset of peptide sequences containing noncanonical amino acids and generating atomic level features for downstream property prediction. We show that atomic-level representations of peptides containing noncanonical amino acids consistently outperform sequence-level representations, regardless of model type. We additionally explore the representation of noncanonical peptides through latent space visualization and show that models with atomic-level information can effectively learn relationships between analogous sequences of <span>l</span>-peptides, <span>d</span>-peptides, and peptoids. This framework allows for the flexible incorporation of new amino acid chemistries, enabling existing ML methods to be straightforwardly applied to datasets of peptides containing nonstandard features. It also facilitates the rapid construction of customized peptide libraries and provides a scalable platform to accelerate ML-driven peptide discovery and optimization.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1264–1273"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02629","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145964575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNACSE: Enhancing Genomic LLMs with Contrastive Learning for DNA Barcode Identification DNACSE:增强基因组法学硕士与DNA条形码识别的对比学习。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-13 DOI: 10.1021/acs.jcim.5c02747
Jiadong Wang, , , Bin Wang*, , , Shihua Zhou*, , , Ben Cao*, , , Wei Li, , and , Pan Zheng, 

DNA barcoding is a powerful tool for exploring biodiversity, and DNA language models have significantly facilitated its construction and identification. However, since DNA barcodes come from a specific region of mitochondrial DNA and there are structural differences between DNA barcodes and reference genomes used to train existing DNA language models, it is difficult to directly apply the existing DNA language models to the DNA barcoding task. To address this, this paper introduces DNACSE (DNA Contrastive Learning for Sequence Embeddings), an unsupervised noise-contrastive learning framework designed to fine-tune the DNA language foundation model while enhancing the distribution of the embedding space. The results demonstrate that DNACSE outperforms the direct usage of DNA language models in DNA barcoding-related tasks. Specifically, in fine-tuning and linear probe tasks, it achieves accuracy rates of 99.17 and 98.31%, respectively, surpassing the current state-of-the-art BarcodeBERT by 6.44 and 6.44%. In zero-shot clustering tasks, it raises the adjusted mutual information (AMI) score to 92.25%, an improvement of 8.36%. In addition, zero-shot benchmarking and genomic benchmarking tests are evaluated, indicating that DNACSE enhances the performance of DNA language models in generalized genomic tasks. In summary, DNACSE has demonstrated excellent performance in DNA barcode species classification by making full use of multispecies information and DNA barcode information, providing a feasible way to further explore and protect biodiversity. The code repository is available at https://github.com/Kavicy/DNACSE.

DNA条形码是探索生物多样性的有力工具,DNA语言模型为其构建和鉴定提供了重要的便利。然而,由于DNA条形码来自线粒体DNA的特定区域,并且DNA条形码与用于训练现有DNA语言模型的参考基因组存在结构差异,因此很难将现有DNA语言模型直接应用于DNA条形码任务。为了解决这个问题,本文引入了DNACSE (DNA对比学习序列嵌入),这是一个无监督的噪声对比学习框架,旨在微调DNA语言基础模型,同时增强嵌入空间的分布。结果表明,DNACSE在DNA条形码相关任务中优于直接使用DNA语言模型。具体来说,在微调和线性探测任务中,它的准确率分别达到99.17%和98.31%,比目前最先进的BarcodeBERT分别高出6.44%和6.44%。在零次聚类任务中,将调整后的互信息(AMI)得分提高到92.25%,提高了8.36%。此外,对零基准测试和基因组基准测试进行了评估,表明DNACSE提高了DNA语言模型在广义基因组任务中的性能。综上所述,DNACSE充分利用了多物种信息和DNA条形码信息,在DNA条形码物种分类中表现出优异的性能,为进一步探索和保护生物多样性提供了可行的途径。代码存储库可从https://github.com/Kavicy/DNACSE获得。
{"title":"DNACSE: Enhancing Genomic LLMs with Contrastive Learning for DNA Barcode Identification","authors":"Jiadong Wang,&nbsp;, ,&nbsp;Bin Wang*,&nbsp;, ,&nbsp;Shihua Zhou*,&nbsp;, ,&nbsp;Ben Cao*,&nbsp;, ,&nbsp;Wei Li,&nbsp;, and ,&nbsp;Pan Zheng,&nbsp;","doi":"10.1021/acs.jcim.5c02747","DOIUrl":"10.1021/acs.jcim.5c02747","url":null,"abstract":"<p >DNA barcoding is a powerful tool for exploring biodiversity, and DNA language models have significantly facilitated its construction and identification. However, since DNA barcodes come from a specific region of mitochondrial DNA and there are structural differences between DNA barcodes and reference genomes used to train existing DNA language models, it is difficult to directly apply the existing DNA language models to the DNA barcoding task. To address this, this paper introduces DNACSE (DNA Contrastive Learning for Sequence Embeddings), an unsupervised noise-contrastive learning framework designed to fine-tune the DNA language foundation model while enhancing the distribution of the embedding space. The results demonstrate that DNACSE outperforms the direct usage of DNA language models in DNA barcoding-related tasks. Specifically, in fine-tuning and linear probe tasks, it achieves accuracy rates of 99.17 and 98.31%, respectively, surpassing the current state-of-the-art BarcodeBERT by 6.44 and 6.44%. In zero-shot clustering tasks, it raises the adjusted mutual information (AMI) score to 92.25%, an improvement of 8.36%. In addition, zero-shot benchmarking and genomic benchmarking tests are evaluated, indicating that DNACSE enhances the performance of DNA language models in generalized genomic tasks. In summary, DNACSE has demonstrated excellent performance in DNA barcode species classification by making full use of multispecies information and DNA barcode information, providing a feasible way to further explore and protect biodiversity. The code repository is available at https://github.com/Kavicy/DNACSE.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"976–993"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Blobulator: A Toolkit for Identification and Visual Exploration of Hydrophobic Modularity in Protein Sequences Blobulator:一个用于蛋白质序列中疏水模块性识别和视觉探索的工具包。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-13 DOI: 10.1021/acs.jcim.5c01585
Connor Pitman, , , Ezry Santiago-McRae, , , Ruchi Lohia, , , Ryan Lamb, , , Kaitlin Bassi, , , Lindsey Riggs, , , Thomas T. Joseph, , , Matthew E. B. Hansen, , and , Grace Brannigan*, 

While contiguous subsequences of hydrophobic residues are essential to protein structure and function, as in the hydrophobic core and transmembrane regions, there are no current bioinformatics tools for module identification focused on hydrophobicity. To fill this gap, we created the blobulator toolkit for detecting, visualizing, and characterizing hydrophobic modules in protein sequences. This toolkit uses our previously developed algorithm, blobulation, which was critical in both interpreting intraprotein contacts in a series of intrinsically disordered protein simulations (Lohia et al., 2019) and defining the “local context” around disease-associated mutations across the human proteome (Lohia et al., 2022). The blobulator toolkit provides accessible, interactive, and scalable implementations of blobulation. These are available via a webtool, a visual molecular dynamics (VMD) plugin, and a command line interface. We highlight use cases for visualization, interaction analysis, and modular annotation through three example applications: a globular protein, two orthologous membrane proteins, and an intrinsically disordered protein. The blobulator webtool can be found at www.blobulator.branniganlab.org, and the source code with pip installable command line tool, as well as the VMD plugin with installation instructions, can be found on GitHub at www.GitHub.com/BranniganLab/blobulator.

虽然疏水残基的连续子序列对蛋白质的结构和功能至关重要,如疏水核心和跨膜区域,但目前还没有生物信息学工具来鉴定模块的疏水性。为了填补这一空白,我们创建了blobulator工具包,用于检测、可视化和表征蛋白质序列中的疏水模块。该工具包使用了我们之前开发的blobulation算法,这对于解释一系列内在无序蛋白质模拟中的蛋白质内接触(Lohia等人,2019)以及定义人类蛋白质组中疾病相关突变的“局部背景”至关重要(Lohia等人,2022)。blobulator工具包提供了blobulation的可访问、交互式和可扩展的实现。这些都可以通过webtool、可视化分子动力学(VMD)插件和命令行界面获得。我们通过三个示例应用程序强调可视化、交互分析和模块化注释的用例:一个球状蛋白、两个同源膜蛋白和一个内在无序蛋白。blobulator webtool可以在www.blobulator.branniganlab.org上找到,pip可安装命令行工具的源代码,以及VMD插件的安装说明,可以在GitHub上找到www.GitHub.com/BranniganLab/blobulator。
{"title":"The Blobulator: A Toolkit for Identification and Visual Exploration of Hydrophobic Modularity in Protein Sequences","authors":"Connor Pitman,&nbsp;, ,&nbsp;Ezry Santiago-McRae,&nbsp;, ,&nbsp;Ruchi Lohia,&nbsp;, ,&nbsp;Ryan Lamb,&nbsp;, ,&nbsp;Kaitlin Bassi,&nbsp;, ,&nbsp;Lindsey Riggs,&nbsp;, ,&nbsp;Thomas T. Joseph,&nbsp;, ,&nbsp;Matthew E. B. Hansen,&nbsp;, and ,&nbsp;Grace Brannigan*,&nbsp;","doi":"10.1021/acs.jcim.5c01585","DOIUrl":"10.1021/acs.jcim.5c01585","url":null,"abstract":"<p >While contiguous subsequences of hydrophobic residues are essential to protein structure and function, as in the hydrophobic core and transmembrane regions, there are no current bioinformatics tools for module identification focused on hydrophobicity. To fill this gap, we created the <i>blobulator</i> toolkit for detecting, visualizing, and characterizing hydrophobic modules in protein sequences. This toolkit uses our previously developed algorithm, blobulation, which was critical in both interpreting intraprotein contacts in a series of intrinsically disordered protein simulations (Lohia et al., 2019) and defining the “local context” around disease-associated mutations across the human proteome (Lohia et al., 2022). The <i>blobulator</i> toolkit provides accessible, interactive, and scalable implementations of blobulation. These are available via a webtool, a visual molecular dynamics (VMD) plugin, and a command line interface. We highlight use cases for visualization, interaction analysis, and modular annotation through three example applications: a globular protein, two orthologous membrane proteins, and an intrinsically disordered protein. The <i>blobulator</i> webtool can be found at www.blobulator.branniganlab.org, and the source code with pip installable command line tool, as well as the VMD plugin with installation instructions, can be found on GitHub at www.GitHub.com/BranniganLab/blobulator.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"820–828"},"PeriodicalIF":5.3,"publicationDate":"2026-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01585","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145964578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Application of Semiempirical Quantum Mechanical Methods To Accurately Estimate Ligand-Binding Structure in Biological Systems: Protein Kinase Case Study 应用半经验量子力学方法精确估计生物系统中的配体结合结构:蛋白激酶案例研究。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-12 DOI: 10.1021/acs.jcim.5c02274
Charles-Alexandre Mattelaer*, , , Roberto Fino, , , Christian Permann, , , Thierry Langer, , , Edgar Jacoby, , and , Jeremy N. Harvey, 

This study presents a quantum mechanical (QM)-based workflow utilizing semiempirical methods for estimating ligand binding poses in protein–ligand complexes, focusing on kinases of pharmaceutical relevance: CDK2, CK2, p38α, and CAMK1DA. The protocol integrates xTB-based docking with PM6-D3H4X rescoring within a QM/MM framework, aiming to enhance the structural accuracy over traditional docking methods. Drug-like and fragment-like ligands were tested across different receptor structures to assess robustness for CDK2 and CK2. Results show that for drug-like ligands the QM approach reproduces experimental poses. However, performance is less consistent for the fragment-like ligands we have considered, perhaps due to structural ambiguity and weak binding interactions limiting accuracy.

本研究提出了一种基于量子力学(QM)的工作流程,利用半经验方法来估计蛋白质-配体复合物中配体的结合姿势,重点关注与药物相关的激酶:CDK2, CK2, p38α和CAMK1DA。该协议将基于xtb的对接与PM6-D3H4X记录集成在QM/MM框架内,旨在提高传统对接方法的结构精度。药物样配体和片段样配体在不同受体结构中进行了测试,以评估CDK2和CK2的稳健性。结果表明,对于类药物配体,QM方法再现了实验姿态。然而,我们所考虑的片段状配体的性能不太一致,可能是由于结构模糊和弱结合相互作用限制了准确性。
{"title":"Application of Semiempirical Quantum Mechanical Methods To Accurately Estimate Ligand-Binding Structure in Biological Systems: Protein Kinase Case Study","authors":"Charles-Alexandre Mattelaer*,&nbsp;, ,&nbsp;Roberto Fino,&nbsp;, ,&nbsp;Christian Permann,&nbsp;, ,&nbsp;Thierry Langer,&nbsp;, ,&nbsp;Edgar Jacoby,&nbsp;, and ,&nbsp;Jeremy N. Harvey,&nbsp;","doi":"10.1021/acs.jcim.5c02274","DOIUrl":"10.1021/acs.jcim.5c02274","url":null,"abstract":"<p >This study presents a quantum mechanical (QM)-based workflow utilizing semiempirical methods for estimating ligand binding poses in protein–ligand complexes, focusing on kinases of pharmaceutical relevance: CDK2, CK2, p38α, and CAMK1DA. The protocol integrates xTB-based docking with PM6-D3H4X rescoring within a QM/MM framework, aiming to enhance the structural accuracy over traditional docking methods. Drug-like and fragment-like ligands were tested across different receptor structures to assess robustness for CDK2 and CK2. Results show that for drug-like ligands the QM approach reproduces experimental poses. However, performance is less consistent for the fragment-like ligands we have considered, perhaps due to structural ambiguity and weak binding interactions limiting accuracy.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1231–1240"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145955981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing CYP450-Ligand Binding Predictions: A Comparative Analysis of Ligand-Based and Hybrid Machine Learning Models 增强cyp450 -配体结合预测:基于配体和混合机器学习模型的比较分析。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-12 DOI: 10.1021/acs.jcim.5c01098
Anastasiia Tikhonova, , , Eric Chun Yong Chan, , and , Hao Fan*, 

Predicting cytochrome P450 (CYP450) ligand binding is critical in early-stage drug discovery as CYP450-mediated metabolism profoundly influences drug efficacy, safety, and adverse reaction risks. However, experimental determination of CYP450-ligand interactions remains resource- and time-intensive, underscoring the need for robust computational alternatives. While ligand-based methods are commonly employed, they often fail to fully account for structural intricacies governing protein–ligand interactions. To address this gap, we developed a hybrid machine learning framework integrating ligand descriptors, protein descriptors, and protein–ligand interaction descriptors that include molecular docking-derived parameters, rescoring function components from multiple algorithms, and structural interaction fingerprints (SIFt). Evaluated on CYP1A2 and CYP17A1 isoforms, our model demonstrated superior predictive accuracy in cross-validation compared with stand-alone molecular docking and ligand-based approaches. Furthermore, benchmarking against state-of-the-art tools (SwissADME and ADMETlab 3.0) revealed enhanced performance in binding prediction. This work establishes a versatile framework for advancing computational tools to prioritize CYP450 binding assessments during drug discovery.

预测细胞色素P450 (CYP450)配体结合在早期药物发现中至关重要,因为CYP450介导的代谢会深刻影响药物的疗效、安全性和不良反应风险。然而,cyp450 -配体相互作用的实验测定仍然是资源和时间密集型的,强调需要强大的计算替代方案。虽然通常采用基于配体的方法,但它们往往不能完全解释控制蛋白质-配体相互作用的结构复杂性。为了解决这一差距,我们开发了一个混合机器学习框架,集成了配体描述符、蛋白质描述符和蛋白质-配体相互作用描述符,包括分子对接衍生参数、从多种算法中重新记录功能成分和结构相互作用指纹(SIFt)。通过对CYP1A2和CYP17A1亚型的评估,与独立的分子对接和基于配体的方法相比,我们的模型在交叉验证中显示出更高的预测准确性。此外,针对最先进的工具(SwissADME和ADMETlab 3.0)的基准测试显示,绑定预测的性能有所提高。这项工作建立了一个通用的框架,用于推进计算工具,在药物发现过程中优先考虑CYP450结合评估。
{"title":"Enhancing CYP450-Ligand Binding Predictions: A Comparative Analysis of Ligand-Based and Hybrid Machine Learning Models","authors":"Anastasiia Tikhonova,&nbsp;, ,&nbsp;Eric Chun Yong Chan,&nbsp;, and ,&nbsp;Hao Fan*,&nbsp;","doi":"10.1021/acs.jcim.5c01098","DOIUrl":"10.1021/acs.jcim.5c01098","url":null,"abstract":"<p >Predicting cytochrome P450 (CYP450) ligand binding is critical in early-stage drug discovery as CYP450-mediated metabolism profoundly influences drug efficacy, safety, and adverse reaction risks. However, experimental determination of CYP450-ligand interactions remains resource- and time-intensive, underscoring the need for robust computational alternatives. While ligand-based methods are commonly employed, they often fail to fully account for structural intricacies governing protein–ligand interactions. To address this gap, we developed a hybrid machine learning framework integrating ligand descriptors, protein descriptors, and protein–ligand interaction descriptors that include molecular docking-derived parameters, rescoring function components from multiple algorithms, and structural interaction fingerprints (SIFt). Evaluated on CYP1A2 and CYP17A1 isoforms, our model demonstrated superior predictive accuracy in cross-validation compared with stand-alone molecular docking and ligand-based approaches. Furthermore, benchmarking against state-of-the-art tools (SwissADME and ADMETlab 3.0) revealed enhanced performance in binding prediction. This work establishes a versatile framework for advancing computational tools to prioritize CYP450 binding assessments during drug discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"834–846"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Solvent Graph Neural Network for Reduction Potential Prediction Across the Chemical Space 跨化学空间还原电位预测的多溶剂图神经网络。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-12 DOI: 10.1021/acs.jcim.5c01450
Rostislav Fedorov, , , Anastasiia Nihei, , and , Ganna Gryn’ova*, 

Reduction potentials of redox-active molecules and materials are essential descriptors of their performance as catalysts, antioxidants, electrode materials, etc. For a given species, its practical applications often span a range of solvent environments, which profoundly impact its redox properties. In this work, we present a message passing graph neural network architecture with a Set Transformer readout trained on ca. 20,000 reduction potentials of chemically diverse closed- and open-shell organic redox-active molecules (the “ReSolved” data set), computed using a rigorously benchmarked density functional theory procedure. The predictor model affords high accuracy with mean absolute errors of ca. 0.2 eV and is uniquely able to generalize to previously unseen solvents. We couple this architecture with an evolutionary algorithm to inverse-design synthetically accessible candidate molecules with target reduction potentials for several battery-related practical applications.

氧化还原活性分子和材料的还原电位是表征其作为催化剂、抗氧化剂、电极材料等性能的重要指标。对于一个给定的物种,它的实际应用往往跨越一系列的溶剂环境,这深刻地影响了它的氧化还原性能。在这项工作中,我们提出了一个消息传递图神经网络架构,该架构具有Set Transformer读数,该读数训练于化学上不同的闭壳和开壳有机氧化还原活性分子(“ReSolved”数据集)的约20,000个还原电位,并使用严格基准的密度泛函理论程序进行计算。预测模型提供了高精度的平均绝对误差约0.2 eV,是唯一能够推广到以前看不见的溶剂。我们将这种结构与一种进化算法结合起来,对几种与电池相关的实际应用中具有目标还原电位的综合可达候选分子进行反设计。
{"title":"Multi-Solvent Graph Neural Network for Reduction Potential Prediction Across the Chemical Space","authors":"Rostislav Fedorov,&nbsp;, ,&nbsp;Anastasiia Nihei,&nbsp;, and ,&nbsp;Ganna Gryn’ova*,&nbsp;","doi":"10.1021/acs.jcim.5c01450","DOIUrl":"10.1021/acs.jcim.5c01450","url":null,"abstract":"<p >Reduction potentials of redox-active molecules and materials are essential descriptors of their performance as catalysts, antioxidants, electrode materials, etc. For a given species, its practical applications often span a range of solvent environments, which profoundly impact its redox properties. In this work, we present a message passing graph neural network architecture with a Set Transformer readout trained on <i>ca</i>. 20,000 reduction potentials of chemically diverse closed- and open-shell organic redox-active molecules (the “ReSolved” data set), computed using a rigorously benchmarked density functional theory procedure. The predictor model affords high accuracy with mean absolute errors of <i>ca</i>. 0.2 eV and is uniquely able to generalize to previously unseen solvents. We couple this architecture with an evolutionary algorithm to inverse-design synthetically accessible candidate molecules with target reduction potentials for several battery-related practical applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"847–854"},"PeriodicalIF":5.3,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c01450","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Curated and Structure-Based Drug–Target Interactions Improve Underprediction of Drug Side Effects in Network Models 策划和基于结构的药物-靶标相互作用改善了网络模型中药物副作用的低估。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-10 DOI: 10.1021/acs.jcim.5c01822
Mohammadali Alidoost, , , Amy Le, , and , Jennifer L. Wilson*, 

The accurate prediction of drug-induced side effects remains a significant challenge in pharmaceutical development, particularly in early development, as drug programs often fail due to unforeseen adverse reactions. Conventional approaches, such as preclinical animal testing and in vitro assays, are limited by high costs, ethical concerns, and reduced translatability to human biology. Predictive algorithms, including protein–protein interaction network models, have emerged as a computational approach with the potential to predict adverse drug effects, but these models often suffer from limited performance, specifically underprediction. Further, the documented drug–protein interactions are inconsistently reported, affecting our ability to select data for predicting drug-induced side effects or building more performant models. We integrated drug-binding targets from six sources: DrugBank, ChEMBL, PubChem, Search Tool for Interacting Chemicals, Therapeutic Target Database, and PocketFEATURE into our existing platform, PathFX, to understand their impact on the prediction of drug side effects. We observed unique drug–target interactions and target-associated protein classes and functions across sources. Integrating new drug targets predicted previously unrecognized side effects and revealed a trade-off between sensitivity and specificity. Sensitivity generally improved using large exploratory databases or the union of all targets, at the cost of reduced specificity. Databases with smaller numbers of curated targets or structurally predicted targets improved specificity. This quantitative analysis lays the foundation for improvement of drug-side effect prediction, where sophisticated machine learning approaches may better leverage large exploratory databases when balanced and performant analysis is required, and smaller, curated data sources could be integrated with simple but explainable platforms, like PathFX, for hypothesis generation.

准确预测药物引起的副作用仍然是药物开发中的一个重大挑战,特别是在早期开发中,因为药物计划经常由于不可预见的不良反应而失败。传统的方法,如临床前动物试验和体外分析,由于成本高、伦理问题和对人类生物学的可转译性降低而受到限制。预测算法,包括蛋白质-蛋白质相互作用网络模型,已经成为一种具有预测药物不良反应潜力的计算方法,但这些模型通常性能有限,特别是预测不足。此外,文献记载的药物-蛋白质相互作用的报道不一致,影响了我们选择数据预测药物引起的副作用或建立更高效模型的能力。我们将六个来源的药物结合靶标整合到我们现有的平台PathFX中:DrugBank、ChEMBL、PubChem、Search Tool for Interacting Chemicals、Therapeutic Target Database和PocketFEATURE,以了解它们对药物副作用预测的影响。我们观察到不同来源的独特药物-靶标相互作用和靶标相关蛋白类别和功能。整合新的药物靶点预测了以前未被认识到的副作用,并揭示了敏感性和特异性之间的权衡。使用大型探索性数据库或所有目标的联合通常可以提高灵敏度,但代价是降低特异性。具有较少数量的策划靶点或结构预测靶点的数据库提高了特异性。这种定量分析为改进药物副作用预测奠定了基础,在需要平衡和性能分析时,复杂的机器学习方法可以更好地利用大型探索性数据库,而较小的、精心策划的数据源可以与简单但可解释的平台(如PathFX)集成,以生成假设。
{"title":"Curated and Structure-Based Drug–Target Interactions Improve Underprediction of Drug Side Effects in Network Models","authors":"Mohammadali Alidoost,&nbsp;, ,&nbsp;Amy Le,&nbsp;, and ,&nbsp;Jennifer L. Wilson*,&nbsp;","doi":"10.1021/acs.jcim.5c01822","DOIUrl":"10.1021/acs.jcim.5c01822","url":null,"abstract":"<p >The accurate prediction of drug-induced side effects remains a significant challenge in pharmaceutical development, particularly in early development, as drug programs often fail due to unforeseen adverse reactions. Conventional approaches, such as preclinical animal testing and <i>in vitro</i> assays, are limited by high costs, ethical concerns, and reduced translatability to human biology. Predictive algorithms, including protein–protein interaction network models, have emerged as a computational approach with the potential to predict adverse drug effects, but these models often suffer from limited performance, specifically underprediction. Further, the documented drug–protein interactions are inconsistently reported, affecting our ability to select data for predicting drug-induced side effects or building more performant models. We integrated drug-binding targets from six sources: DrugBank, ChEMBL, PubChem, Search Tool for Interacting Chemicals, Therapeutic Target Database, and PocketFEATURE into our existing platform, PathFX, to understand their impact on the prediction of drug side effects. We observed unique drug–target interactions and target-associated protein classes and functions across sources. Integrating new drug targets predicted previously unrecognized side effects and revealed a trade-off between sensitivity and specificity. Sensitivity generally improved using large exploratory databases or the union of all targets, at the cost of reduced specificity. Databases with smaller numbers of curated targets or structurally predicted targets improved specificity. This quantitative analysis lays the foundation for improvement of drug-side effect prediction, where sophisticated machine learning approaches may better leverage large exploratory databases when balanced and performant analysis is required, and smaller, curated data sources could be integrated with simple but explainable platforms, like PathFX, for hypothesis generation.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1214–1230"},"PeriodicalIF":5.3,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145947303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Bond Reconstruction toward Generative Molecular Design 面向生成式分子设计的多模态键重建。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-09 DOI: 10.1021/acs.jcim.5c03052
Jian Wang,  and , Nikolay V. Dokholyan*, 

Generative models such as diffusion-based approaches have transformed de novo drug design by enabling rapid generation of novel molecular structures in both 2D and 3D formats. However, accurate reconstruction of chemical bonds, especially from distorted geometries produced by generative models, remains a critical challenge. Here, we present YuelBond, a multimodal graph neural network framework for robust bond reconstruction across three key scenarios: (i) recovery of bonds from accurate 3D atomic coordinates, (ii) reconstruction of chemically valid bonds in crude de novo generated compounds (CDGs) with perturbed geometries, and (iii) reassignment of bond orders in 2D topological graphs. YuelBond outperforms traditional rule-based methods such as RDKit, achieving 98.4% F1 score on standard 3D structures and maintaining strong performance (92.7% F1 score) on distorted CDGs, even when RDKit fails in most cases. Our results demonstrate that YuelBond enables accurate and reliable bond reconstruction from imperfect molecular data, bridging a critical gap in generative drug discovery pipelines.

生成模型,如基于扩散的方法,通过能够以2D和3D格式快速生成新的分子结构,已经改变了从头药物设计。然而,化学键的精确重建,特别是生成模型产生的扭曲几何,仍然是一个关键的挑战。在这里,我们提出了一个多模态图神经网络框架YuelBond,用于在三个关键场景下进行稳健的键重建:(i)从精确的3D原子坐标中恢复键,(ii)重建具有扰动几何结构的原油从头生成化合物(CDGs)中的化学有效键,以及(iii)在2D拓扑图中重新分配键的顺序。YuelBond优于传统的基于规则的方法,如RDKit,在标准3D结构上获得98.4%的F1分数,在扭曲的cdg上保持强劲的性能(92.7%的F1分数),即使RDKit在大多数情况下失败。我们的研究结果表明,YuelBond能够从不完善的分子数据中精确可靠地重建键,弥合了再生药物发现管道的关键空白。
{"title":"Multimodal Bond Reconstruction toward Generative Molecular Design","authors":"Jian Wang,&nbsp; and ,&nbsp;Nikolay V. Dokholyan*,&nbsp;","doi":"10.1021/acs.jcim.5c03052","DOIUrl":"10.1021/acs.jcim.5c03052","url":null,"abstract":"<p >Generative models such as diffusion-based approaches have transformed <i>de novo</i> drug design by enabling rapid generation of novel molecular structures in both 2D and 3D formats. However, accurate reconstruction of chemical bonds, especially from distorted geometries produced by generative models, remains a critical challenge. Here, we present YuelBond, a multimodal graph neural network framework for robust bond reconstruction across three key scenarios: (i) recovery of bonds from accurate 3D atomic coordinates, (ii) reconstruction of chemically valid bonds in crude <i>de novo</i> generated compounds (CDGs) with perturbed geometries, and (iii) reassignment of bond orders in 2D topological graphs. YuelBond outperforms traditional rule-based methods such as RDKit, achieving 98.4% F1 score on standard 3D structures and maintaining strong performance (92.7% F1 score) on distorted CDGs, even when RDKit fails in most cases. Our results demonstrate that YuelBond enables accurate and reliable bond reconstruction from imperfect molecular data, bridging a critical gap in generative drug discovery pipelines.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1003–1012"},"PeriodicalIF":5.3,"publicationDate":"2026-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145941917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemical Information and Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1