首页 > 最新文献

Journal of Chemical Information and Modeling 最新文献

英文 中文
A Relative Binding Free Energy Framework for Structurally Dissimilar Molecules. 结构不同分子的相对结合自由能框架。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-17 DOI: 10.1021/acs.jcim.5c02204
Hsu-Chun Tsai,Shi Zhang,Tai-Sung Lee,Timothy J Giese,Charles Lin,James Xu,Yinhui Yi,Darrin M York,Abir Ganguly,Albert C Pan
Relative binding free energy (RBFE) calculations, widely used to predict the potencies of congeneric small molecules binding to a protein receptor, can greatly increase the efficiency of the hit-to-lead and lead optimization stages of the drug discovery process. Traditional RBFE methods, however, cannot be easily applied to small molecules lacking a common core or binding mode, precluding their use in a challenging but crucial component of many drug discovery campaigns. In principle, an absolute binding free energy (ABFE) method can be applied to such molecules, but ABFE often suffers from high computational cost and poor statistical convergence due to the large amount of additional sampling required when compared to RBFE. Here, we introduce core-hopping binding free energy (CBFE) calculations, a computationally efficient framework for the accurate determination of relative binding free energies between small molecules with different cores, leveraging several recently developed techniques such as Alchemical Enhanced Sampling (ACES) with optimized transformation pathways and flexible λ-spacing, as well as λ-dependent Boresch restraints. We benchmark the performance of CBFE across 4 protein systems consisting of 56 small molecules, and find that the results are consistent with RBFE for a congeneric series of ligands and offer considerable improvement in computational cost and precision relative to ABFE results for a series of small molecules with diverse cores and binding modes. All CBFE-related developments are fully implemented in the GPU-accelerated AMBER free energy module (pmemd.cuda) and are available as part of the latest official AMBER release.
相对结合自由能(RBFE)计算被广泛用于预测同源小分子与蛋白质受体结合的能力,可以大大提高药物发现过程中靶向和先导优化阶段的效率。然而,传统的RBFE方法不能很容易地应用于缺乏共同核心或结合模式的小分子,这阻碍了它们在许多药物发现活动中具有挑战性但至关重要的组成部分的使用。原则上,绝对结合自由能(ABFE)方法可以应用于这类分子,但与RBFE相比,ABFE通常需要大量的额外采样,计算成本高,统计收敛性差。在这里,我们引入了跳核结合自由能(CBFE)计算,这是一种计算效率高的框架,用于精确确定具有不同核的小分子之间的相对结合自由能,利用了几种最近开发的技术,如具有优化转化途径和灵活λ间距的炼金术增强采样(ACES),以及λ依赖的Boresch约束。我们对由56个小分子组成的4个蛋白质系统的CBFE性能进行了基准测试,发现对于一系列同源配体的结果与RBFE一致,并且相对于具有不同核心和结合模式的一系列小分子的计算成本和精度有了相当大的提高。所有与cbfe相关的开发都在gpu加速的AMBER自由能量模块(pmemd.cuda)中完全实现,并作为最新官方AMBER版本的一部分提供。
{"title":"A Relative Binding Free Energy Framework for Structurally Dissimilar Molecules.","authors":"Hsu-Chun Tsai,Shi Zhang,Tai-Sung Lee,Timothy J Giese,Charles Lin,James Xu,Yinhui Yi,Darrin M York,Abir Ganguly,Albert C Pan","doi":"10.1021/acs.jcim.5c02204","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02204","url":null,"abstract":"Relative binding free energy (RBFE) calculations, widely used to predict the potencies of congeneric small molecules binding to a protein receptor, can greatly increase the efficiency of the hit-to-lead and lead optimization stages of the drug discovery process. Traditional RBFE methods, however, cannot be easily applied to small molecules lacking a common core or binding mode, precluding their use in a challenging but crucial component of many drug discovery campaigns. In principle, an absolute binding free energy (ABFE) method can be applied to such molecules, but ABFE often suffers from high computational cost and poor statistical convergence due to the large amount of additional sampling required when compared to RBFE. Here, we introduce core-hopping binding free energy (CBFE) calculations, a computationally efficient framework for the accurate determination of relative binding free energies between small molecules with different cores, leveraging several recently developed techniques such as Alchemical Enhanced Sampling (ACES) with optimized transformation pathways and flexible λ-spacing, as well as λ-dependent Boresch restraints. We benchmark the performance of CBFE across 4 protein systems consisting of 56 small molecules, and find that the results are consistent with RBFE for a congeneric series of ligands and offer considerable improvement in computational cost and precision relative to ABFE results for a series of small molecules with diverse cores and binding modes. All CBFE-related developments are fully implemented in the GPU-accelerated AMBER free energy module (pmemd.cuda) and are available as part of the latest official AMBER release.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"269 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
G.AI.A: An Integrated Machine-Learning Platform for Predicting Bioaccumulation and Ecotoxicity of Pharmaceuticals. a.a a:预测药物生物积累和生态毒性的集成机器学习平台。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-16 DOI: 10.1021/acs.jcim.5c02286
Evangelos Tsoukas,Michail Papadourakis,Eleni Chontzopoulou,Spyridon Vythoulkas,Christos Didachos,Dionisis Cavouras,Panagiotis Zoumpoulakis,Minos-Timotheos Matsoukas
Pharmaceutical pollution in aquatic environments poses a significant ecological threat due to the accumulation of bioactive compounds from human and veterinary sources. In support of the EU Green Deal's Chemicals Strategy for Sustainability, this study presents a computational framework for predicting two key environmental risk indicators in fish: bioconcentration and ecotoxicity. Bioconcentration, quantified by the bioconcentration factor (BCF), reflects a chemical's tendency to accumulate in organisms, while ecotoxicity is assessed via the median lethal concentration (LC50) over defined exposure periods. We developed two high-performing machine learning (ML) models, achieving ROC AUC scores of 94.60% for bioconcentration and 96.06% for ecotoxicity, validated across both internal and external data sets. To expand the scope of risk evaluation, we incorporated metabolite prediction using the SyGMa tool, selected after benchmarking multiple alternatives. This enables the assessment of both parent compounds and their potentially toxic metabolites. Model interpretability was enhanced through molecular fingerprint analysis, which identified structural features associated with toxicity and accumulation, informing the early stages of drug design. To support practical implementation, we introduced G.AI.A (https://gaiatox.eu/), an intuitive web platform that allows users to input Simplified Molecular Input Line Entry System (SMILES) strings for rapid prediction of environmental risk end points. The application domain of G.AI.A lies in predictive toxicology, enabling researchers and regulatory bodies to assess the toxicological profiles of small organic compounds, excluding those containing heavy metals, by analyzing their chemical structures. The platform supports batch processing and offers interactive visualizations, facilitating compound screening and early stage environmental risk assessment. By integrating predictive modeling with interpretability and usability, our framework advances green-by-design pharmaceutical development and contributes to sustainable chemical management.
由于人类和动物来源的生物活性化合物的积累,水生环境中的药物污染构成了重大的生态威胁。为了支持欧盟绿色协议的可持续化学品战略,本研究提出了一个计算框架,用于预测鱼类的两个关键环境风险指标:生物浓度和生态毒性。生物浓度,通过生物浓度因子(BCF)来量化,反映了化学物质在生物体中积累的趋势,而生态毒性是通过在规定的暴露时间内的中位致死浓度(LC50)来评估的。我们开发了两个高性能的机器学习(ML)模型,在内部和外部数据集上验证,生物浓度的ROC AUC得分为94.60%,生态毒性的ROC AUC得分为96.06%。为了扩大风险评估的范围,我们结合了使用SyGMa工具的代谢物预测,该工具是在对多个备选方案进行基准测试后选择的。这样就可以对母体化合物及其潜在毒性代谢物进行评估。通过分子指纹分析增强了模型的可解释性,该分析确定了与毒性和积累相关的结构特征,为药物设计的早期阶段提供了信息。为了支持实际实现,我们引入了g.a.a (https://gaiatox)。eu/),一个直观的网络平台,允许用户输入简化分子输入线输入系统(SMILES)字符串,以快速预测环境风险终点。g.a.a的应用领域在于预测毒理学,使研究人员和监管机构能够通过分析小有机化合物的化学结构来评估毒理学特征,不包括那些含有重金属的化合物。该平台支持批量处理,并提供交互式可视化,促进化合物筛选和早期环境风险评估。通过将预测模型与可解释性和可用性相结合,我们的框架推进了绿色设计的制药开发,并有助于可持续的化学品管理。
{"title":"G.AI.A: An Integrated Machine-Learning Platform for Predicting Bioaccumulation and Ecotoxicity of Pharmaceuticals.","authors":"Evangelos Tsoukas,Michail Papadourakis,Eleni Chontzopoulou,Spyridon Vythoulkas,Christos Didachos,Dionisis Cavouras,Panagiotis Zoumpoulakis,Minos-Timotheos Matsoukas","doi":"10.1021/acs.jcim.5c02286","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02286","url":null,"abstract":"Pharmaceutical pollution in aquatic environments poses a significant ecological threat due to the accumulation of bioactive compounds from human and veterinary sources. In support of the EU Green Deal's Chemicals Strategy for Sustainability, this study presents a computational framework for predicting two key environmental risk indicators in fish: bioconcentration and ecotoxicity. Bioconcentration, quantified by the bioconcentration factor (BCF), reflects a chemical's tendency to accumulate in organisms, while ecotoxicity is assessed via the median lethal concentration (LC50) over defined exposure periods. We developed two high-performing machine learning (ML) models, achieving ROC AUC scores of 94.60% for bioconcentration and 96.06% for ecotoxicity, validated across both internal and external data sets. To expand the scope of risk evaluation, we incorporated metabolite prediction using the SyGMa tool, selected after benchmarking multiple alternatives. This enables the assessment of both parent compounds and their potentially toxic metabolites. Model interpretability was enhanced through molecular fingerprint analysis, which identified structural features associated with toxicity and accumulation, informing the early stages of drug design. To support practical implementation, we introduced G.AI.A (https://gaiatox.eu/), an intuitive web platform that allows users to input Simplified Molecular Input Line Entry System (SMILES) strings for rapid prediction of environmental risk end points. The application domain of G.AI.A lies in predictive toxicology, enabling researchers and regulatory bodies to assess the toxicological profiles of small organic compounds, excluding those containing heavy metals, by analyzing their chemical structures. The platform supports batch processing and offers interactive visualizations, facilitating compound screening and early stage environmental risk assessment. By integrating predictive modeling with interpretability and usability, our framework advances green-by-design pharmaceutical development and contributes to sustainable chemical management.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"57 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynoPore─A Package to Analyze Molecular Dynamics Trajectories of Confined Liquids. DynoPore──一个分析受限液体分子动力学轨迹的软件包。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-16 DOI: 10.1021/acs.jcim.5c01979
Samanvitha Kunigal Vijaya Shankar,Christopher P Ewels,Yann Claveau
We present a Python package, DynoPore, to study the liquids confined in cylindrical and slit-like geometries. Structural analysis functions such as density profiles and radial distribution functions are included to facilitate the understanding of the environment and local structure of liquid molecules within the confined systems. For dynamics, DynoPore includes region-resolved mean-squared displacement and lifetime functions to investigate molecular motion in different regions of the pore. For ionic systems, Dynopore also offer Nernst-Einstein and Einstein-Helfand conductivity analysis functions. By combining these structural and dynamical analysis tools in a single, user-friendly framework, DynoPore delivers a convenient and comprehensive package to analyze confined liquids.
我们提出了一个Python包,DynoPore,来研究限制在圆柱形和狭缝状几何形状的液体。结构分析函数,如密度分布和径向分布函数,包括方便的环境和局部结构的液体分子在密闭系统的理解。对于动力学,DynoPore包括区域解析均方位移和寿命函数,以研究孔隙不同区域的分子运动。对于离子系统,Dynopore还提供能-爱因斯坦和爱因斯坦-海尔芬电导率分析功能。通过将这些结构和动力分析工具结合在一个单一的、用户友好的框架中,DynoPore提供了一个方便和全面的包来分析受限液体。
{"title":"DynoPore─A Package to Analyze Molecular Dynamics Trajectories of Confined Liquids.","authors":"Samanvitha Kunigal Vijaya Shankar,Christopher P Ewels,Yann Claveau","doi":"10.1021/acs.jcim.5c01979","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01979","url":null,"abstract":"We present a Python package, DynoPore, to study the liquids confined in cylindrical and slit-like geometries. Structural analysis functions such as density profiles and radial distribution functions are included to facilitate the understanding of the environment and local structure of liquid molecules within the confined systems. For dynamics, DynoPore includes region-resolved mean-squared displacement and lifetime functions to investigate molecular motion in different regions of the pore. For ionic systems, Dynopore also offer Nernst-Einstein and Einstein-Helfand conductivity analysis functions. By combining these structural and dynamical analysis tools in a single, user-friendly framework, DynoPore delivers a convenient and comprehensive package to analyze confined liquids.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"22 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Atropisomerism for Drug-like Molecules. 药物样分子收缩异构的预测。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-16 DOI: 10.1021/acs.jcim.5c02720
Ty Balduf,Philip A Gerken,Mee Y Shelley,Mark A Watson,M Chandler Bennett,Mats Svensson,Abba E Leffler,Art Bochevarov
A multistep computational workflow that accurately assigns organic drug-like molecules to one of three atropisomer classes on the basis of computed barrier heights has been developed. The workflow identifies rotatable bonds and applies progressively more accurate types of calculations to the eligible rotational degrees of freedom. An initial energy scan with a force field (OPLS4) is followed by a similar scan that uses an energy function driven by a neural network model (QRNN-TB) trained on density functional theory (DFT) energies. The maxima corresponding to the potentially stereogenic rotatable bonds identified at this point are further processed by applying a transition state search at the QRNN-TB level of theory. Finally, ωB97X-D3/def2-TZVP(-f) DFT energies are computed for all located extrema. The accuracy of the predicted rotational barriers was benchmarked against ωB97M-V/cc-pVTZ and DLPNO-CCSD(T)/def2-TZVPP energies with excellent correlations. The automated protocol classifies organic molecules into atropisomeric classes with a greater than 90% success rate when applied to a test set of 65 molecules containing rotationally restricted torsions (68 torsions in total). We anticipate that the balance of speed and accuracy in this method will make it conducive to production use in drug discovery programs.
基于计算出的势垒高度,一种多步骤的计算工作流程可以准确地将有机类药物分子分配给三种阿托普二聚体之一。工作流程确定可旋转键,并逐步将更精确的计算类型应用于合格的旋转自由度。使用力场(OPLS4)进行初始能量扫描之后,使用密度泛函理论(DFT)能量训练的神经网络模型(QRNN-TB)驱动的能量函数进行类似扫描。通过在QRNN-TB理论水平上应用过渡态搜索,进一步处理了此时识别的潜在立体可旋转键对应的最大值。最后,计算了所有定位极值点的ωB97X-D3/def2-TZVP(-f) DFT能量。以ωB97M-V/cc-pVTZ和DLPNO-CCSD(T)/def2-TZVPP能量为基准,对预测的旋转势垒精度进行了测试,结果具有良好的相关性。当应用于65个包含旋转受限扭转的分子(总共68个扭转)的测试集时,自动化方案将有机分子分类为atrop异构类,成功率大于90%。我们期望该方法在速度和准确性上的平衡将有利于药物发现项目的生产应用。
{"title":"Prediction of Atropisomerism for Drug-like Molecules.","authors":"Ty Balduf,Philip A Gerken,Mee Y Shelley,Mark A Watson,M Chandler Bennett,Mats Svensson,Abba E Leffler,Art Bochevarov","doi":"10.1021/acs.jcim.5c02720","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02720","url":null,"abstract":"A multistep computational workflow that accurately assigns organic drug-like molecules to one of three atropisomer classes on the basis of computed barrier heights has been developed. The workflow identifies rotatable bonds and applies progressively more accurate types of calculations to the eligible rotational degrees of freedom. An initial energy scan with a force field (OPLS4) is followed by a similar scan that uses an energy function driven by a neural network model (QRNN-TB) trained on density functional theory (DFT) energies. The maxima corresponding to the potentially stereogenic rotatable bonds identified at this point are further processed by applying a transition state search at the QRNN-TB level of theory. Finally, ωB97X-D3/def2-TZVP(-f) DFT energies are computed for all located extrema. The accuracy of the predicted rotational barriers was benchmarked against ωB97M-V/cc-pVTZ and DLPNO-CCSD(T)/def2-TZVPP energies with excellent correlations. The automated protocol classifies organic molecules into atropisomeric classes with a greater than 90% success rate when applied to a test set of 65 molecules containing rotationally restricted torsions (68 torsions in total). We anticipate that the balance of speed and accuracy in this method will make it conducive to production use in drug discovery programs.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"20 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145986224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scII: Dual-Threshold Adaptive Integration of Single-Cell Multiomics Data Driven by Imputation. 基于输入驱动的单细胞多组学数据双阈值自适应集成。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-15 DOI: 10.1021/acs.jcim.5c02396
Yi Zhang,Yuru Li,Zhicheng Jin,Ye Tian,Chen Su
Single-cell multiomics technologies provide unprecedented opportunities to dissect cellular heterogeneity by capturing multidimensional information on complex cellular states and regulatory networks. However, challenges such as high dimensionality, extreme data sparsity, and modality-specific discrepancies hinder the accuracy, interpretability, and scalability of the existing integration methods. Existing integration paradigms, including horizontal, vertical, and diagonal strategies, are further limited by their inability to fully capture nonlinear biological relationships, their reliance on high-quality data, and their substantial computational demands. Here, we present scII (Dual-Threshold Adaptive Integration of Single-Cell Multiomics Data Driven by Imputation), an adaptive framework designed to integrate gene expression (scRNA-seq) and chromatin accessibility (scATAC-seq) data. Our approach is built on several key conceptual innovations: (i) scRNA-seq-guided signal imputation to enhance information integrity in scATAC-seq; (ii) a multilayer perceptron with the Maxout activation function to improve the modeling of complex nonlinear relationships and mitigate the vanishing gradient problem; (iii) a dynamic dual-threshold adaptive selection mechanism that jointly evaluates cross-modality feature similarity and classification reliability to select high-quality cells; and (iv) Bayesian Information Criterion (BIC)-based optimization to dynamically determine the number of Gaussian Mixture Model components according to data distribution, thereby eliminating reliance on manually preset parameters. Extensive experiments on multiple real-world and simulated data sets demonstrate that scII not only enables efficient integration of unpaired scRNA-seq and scATAC-seq data but also achieves accurate transfer of cell-type annotations, allowing high-precision cell-type prediction for scATAC-seq.
单细胞多组学技术通过捕获复杂细胞状态和调控网络的多维信息,为剖析细胞异质性提供了前所未有的机会。然而,诸如高维性、极端数据稀疏性和特定于模态的差异等挑战阻碍了现有集成方法的准确性、可解释性和可伸缩性。现有的集成范式,包括水平、垂直和对角策略,由于无法完全捕捉非线性生物关系、依赖高质量数据以及大量的计算需求而进一步受到限制。在这里,我们提出了scII(由Imputation驱动的单细胞多组学数据双阈值自适应集成),这是一个旨在整合基因表达(scRNA-seq)和染色质可及性(scATAC-seq)数据的自适应框架。我们的方法建立在几个关键的概念创新之上:(i) scrna -seq引导的信号输入,以增强scacc -seq中的信息完整性;(ii)具有Maxout激活函数的多层感知器,以改进复杂非线性关系的建模并减轻梯度消失问题;(iii)动态双阈值自适应选择机制,联合评估跨模态特征相似性和分类可靠性,以选择高质量的细胞;(iv)基于贝叶斯信息准则(BIC)的优化,根据数据分布动态确定高斯混合模型的分量个数,从而消除对人工预置参数的依赖。在多个真实和模拟数据集上进行的大量实验表明,scII不仅可以有效地整合未配对的scRNA-seq和scATAC-seq数据,还可以实现细胞类型注释的准确传递,从而实现对scATAC-seq的高精度细胞类型预测。
{"title":"scII: Dual-Threshold Adaptive Integration of Single-Cell Multiomics Data Driven by Imputation.","authors":"Yi Zhang,Yuru Li,Zhicheng Jin,Ye Tian,Chen Su","doi":"10.1021/acs.jcim.5c02396","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02396","url":null,"abstract":"Single-cell multiomics technologies provide unprecedented opportunities to dissect cellular heterogeneity by capturing multidimensional information on complex cellular states and regulatory networks. However, challenges such as high dimensionality, extreme data sparsity, and modality-specific discrepancies hinder the accuracy, interpretability, and scalability of the existing integration methods. Existing integration paradigms, including horizontal, vertical, and diagonal strategies, are further limited by their inability to fully capture nonlinear biological relationships, their reliance on high-quality data, and their substantial computational demands. Here, we present scII (Dual-Threshold Adaptive Integration of Single-Cell Multiomics Data Driven by Imputation), an adaptive framework designed to integrate gene expression (scRNA-seq) and chromatin accessibility (scATAC-seq) data. Our approach is built on several key conceptual innovations: (i) scRNA-seq-guided signal imputation to enhance information integrity in scATAC-seq; (ii) a multilayer perceptron with the Maxout activation function to improve the modeling of complex nonlinear relationships and mitigate the vanishing gradient problem; (iii) a dynamic dual-threshold adaptive selection mechanism that jointly evaluates cross-modality feature similarity and classification reliability to select high-quality cells; and (iv) Bayesian Information Criterion (BIC)-based optimization to dynamically determine the number of Gaussian Mixture Model components according to data distribution, thereby eliminating reliance on manually preset parameters. Extensive experiments on multiple real-world and simulated data sets demonstrate that scII not only enables efficient integration of unpaired scRNA-seq and scATAC-seq data but also achieves accurate transfer of cell-type annotations, allowing high-precision cell-type prediction for scATAC-seq.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"5 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conformational Transition of the CARF Domain Driven by Binding Free Energy 结合自由能驱动CARF结构域的构象转变。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02521
Guodong Hu*, , , Jin Qian, , , Chengfei Cai, , and , Jianzhong Chen*, 

Type III CRISPR systems provide adaptive immunity against invasion of foreign nucleic acids by generating cyclic oligoadenylate (cAn) second messengers, which activate effector proteins containing CRISPR-associated Rossmann fold (CARF) domains. The apo form of CARF adopts a closed state, distinct from its cA4-bound open state conformation. To investigate the conformational transition, we performed multiple type molecular dynamics (MD) simulations, revealing a unidirectional conformational shift toward the closed state. This transition was hindered by reduced flexibility in cA4-binding residues. Notably, the conformational change primarily occurs between the two monomers, with minimal structural rearrangement within individual monomers. Comparative analysis showed that while the number of hydrogen bonds and contacts between CARF and cA4 decreases in the closed state, intermonomer interactions are strengthened. Binding free-energy calculations between the two chains of CARF further confirmed higher affinity in the closed state. Our findings support an energy-driven conformational change model, providing insights for optimizing CRISPR-based genetic manipulation tools.

III型CRISPR系统通过产生环低聚腺苷酸(cAn)第二信使,激活含有CRISPR相关的罗斯曼折叠(CARF)结构域的效应蛋白,提供抗外来核酸入侵的适应性免疫。载脂蛋白形式的CARF采用封闭状态,不同于其ca4结合的开放状态构象。为了研究构象转变,我们进行了多类型分子动力学(MD)模拟,揭示了向封闭状态的单向构象转移。这种转变受到ca4结合残基柔韧性降低的阻碍。值得注意的是,构象变化主要发生在两个单体之间,单个单体内部的结构重排最小。对比分析表明,在封闭状态下,CARF与cA4之间的氢键和接触数减少,单体间相互作用增强。CARF两条链之间的结合自由能计算进一步证实了在闭合状态下具有更高的亲和力。我们的研究结果支持能量驱动的构象变化模型,为优化基于crispr的遗传操作工具提供了见解。
{"title":"Conformational Transition of the CARF Domain Driven by Binding Free Energy","authors":"Guodong Hu*,&nbsp;, ,&nbsp;Jin Qian,&nbsp;, ,&nbsp;Chengfei Cai,&nbsp;, and ,&nbsp;Jianzhong Chen*,&nbsp;","doi":"10.1021/acs.jcim.5c02521","DOIUrl":"10.1021/acs.jcim.5c02521","url":null,"abstract":"<p >Type III CRISPR systems provide adaptive immunity against invasion of foreign nucleic acids by generating cyclic oligoadenylate (cA<sub><i>n</i></sub>) second messengers, which activate effector proteins containing CRISPR-associated Rossmann fold (CARF) domains. The apo form of CARF adopts a closed state, distinct from its cA<sub>4</sub>-bound open state conformation. To investigate the conformational transition, we performed multiple type molecular dynamics (MD) simulations, revealing a unidirectional conformational shift toward the closed state. This transition was hindered by reduced flexibility in cA<sub>4</sub>-binding residues. Notably, the conformational change primarily occurs between the two monomers, with minimal structural rearrangement within individual monomers. Comparative analysis showed that while the number of hydrogen bonds and contacts between CARF and cA<sub>4</sub> decreases in the closed state, intermonomer interactions are strengthened. Binding free-energy calculations between the two chains of CARF further confirmed higher affinity in the closed state. Our findings support an energy-driven conformational change model, providing insights for optimizing CRISPR-based genetic manipulation tools.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1179–1189"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145961555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BOLD-GPCRs: A Transformer-Powered App for Predicting Ligand Bioactivity and Mutational Effects across Class A GPCRs bold - gpcr:用于预测A类gpcr配体生物活性和突变效应的变压器驱动应用程序。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c01858
Davide Provasi, , , Kirill Konovalov, , , Nicholas Riina, , , Olivia Cullen, , and , Marta Filizola*, 

G Protein-Coupled Receptors (GPCRs) are important targets for drug discovery owing to their ability to respond to a broad range of stimuli and their involvement in numerous pathologies. Although traditional ligand-based and structure-based approaches have facilitated the development of effective therapeutics for many GPCRs, these approaches often fall short when applied to receptors with limited ligand or structural data. This limitation highlights the critical need for advanced strategies capable of accurately predicting ligand bioactivity across the entire GPCR family, especially for understudied receptor subtypes. In this study, we introduce BOLD-GPCRs (BERT-Optimized Ligand Discovery for GPCRs), a deep learning framework designed to enhance the prediction of ligand bioactivity across class A GPCRs. Accessible via a user-friendly web interface, BOLD-GPCRs employs transfer learning and leverages curated data sets of known class A GPCR ligands, receptor sequences, and signaling-relevant mutations. By integrating dense neural network classifiers with transformer-based protein language models, BOLD-GPCRs captures complex relationships between receptor sequence/function and ligand activity. Our results demonstrate that BOLD-GPCRs achieves robust predictive performance for both ligand bioactivity and mutational effects across a broad range of class A GPCRs, underscoring its potential as a valuable tool for ligand discovery, especially for poorly characterized receptors.

G蛋白偶联受体(gpcr)是药物发现的重要靶标,因为它们能够对广泛的刺激做出反应,并参与许多病理。尽管传统的基于配体和基于结构的方法促进了许多gpcr有效治疗方法的发展,但这些方法在应用于配体或结构数据有限的受体时往往不足。这一限制突出了对能够准确预测整个GPCR家族配体生物活性的先进策略的迫切需要,特别是对于未充分研究的受体亚型。在本研究中,我们引入了bold - gpcr (BERT-Optimized Ligand Discovery for gpcr),这是一个深度学习框架,旨在增强对a类gpcr配体生物活性的预测。bold -GPCR可通过用户友好的网络界面访问,采用迁移学习,并利用已知a类GPCR配体、受体序列和信号相关突变的精心整理的数据集。通过将密集神经网络分类器与基于转换器的蛋白质语言模型相结合,bold - gpcr捕捉到受体序列/功能与配体活性之间的复杂关系。我们的研究结果表明,bold - gpcr在广泛的a类gpcr中实现了对配体生物活性和突变效应的强大预测性能,强调了其作为配体发现的有价值工具的潜力,特别是对于特征不明确的受体。
{"title":"BOLD-GPCRs: A Transformer-Powered App for Predicting Ligand Bioactivity and Mutational Effects across Class A GPCRs","authors":"Davide Provasi,&nbsp;, ,&nbsp;Kirill Konovalov,&nbsp;, ,&nbsp;Nicholas Riina,&nbsp;, ,&nbsp;Olivia Cullen,&nbsp;, and ,&nbsp;Marta Filizola*,&nbsp;","doi":"10.1021/acs.jcim.5c01858","DOIUrl":"10.1021/acs.jcim.5c01858","url":null,"abstract":"<p >G Protein-Coupled Receptors (GPCRs) are important targets for drug discovery owing to their ability to respond to a broad range of stimuli and their involvement in numerous pathologies. Although traditional ligand-based and structure-based approaches have facilitated the development of effective therapeutics for many GPCRs, these approaches often fall short when applied to receptors with limited ligand or structural data. This limitation highlights the critical need for advanced strategies capable of accurately predicting ligand bioactivity across the entire GPCR family, especially for understudied receptor subtypes. In this study, we introduce BOLD-GPCRs (BERT-Optimized Ligand Discovery for GPCRs), a deep learning framework designed to enhance the prediction of ligand bioactivity across class A GPCRs. Accessible via a user-friendly web interface, BOLD-GPCRs employs transfer learning and leverages curated data sets of known class A GPCR ligands, receptor sequences, and signaling-relevant mutations. By integrating dense neural network classifiers with transformer-based protein language models, BOLD-GPCRs captures complex relationships between receptor sequence/function and ligand activity. Our results demonstrate that BOLD-GPCRs achieves robust predictive performance for both ligand bioactivity and mutational effects across a broad range of class A GPCRs, underscoring its potential as a valuable tool for ligand discovery, especially for poorly characterized receptors.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"855–866"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145964542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Protein–Ligand Binding Affinities Using Atomic Surface Site Interaction Points 利用原子表面相互作用点预测蛋白质与配体的结合亲和力。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02628
Katarzyna J. Zator, , , Maria Chiara Storer, , and , Christopher A. Hunter*, 

Atom surface site Interaction Points (AIP) which were previously used to predict association constants for synthetic host–guest systems has been extended to protein–ligand complexes. AIP descriptions of protein binding sites were obtained by combining a library of precomputed AIP descriptors for all protein functional groups with a graph-based substructure matching algorithm. The corresponding AIP description of ligands was obtained directly by footprinting the molecular electrostatic potential surface calculated using density functional theory. These AIP descriptions were projected onto X-ray crystal structures of protein–ligand complexes to identify pairs of AIPs that were sufficiently close in space to constitute an intermolecular interaction. The overall free energy of binding was calculated by summing the contributions of each AIP contact and associated desolvation. Application to the 94 complexes involving uncharged ligands in CASF benchmark data set showed that the method achieves a Pearson correlation coefficient of 0.76 and an RMSD of 11 kJ mol–1 for absolute free energies of binding.

原子表面相互作用点(AIP)先前用于预测合成主客体系统的结合常数,现已扩展到蛋白质-配体复合物。通过将预先计算的所有蛋白质功能基团的AIP描述符库与基于图的子结构匹配算法相结合,获得蛋白质结合位点的AIP描述。通过对密度泛函理论计算的分子静电势面进行足迹化处理,直接得到配体的AIP描述。这些AIP描述被投射到蛋白质-配体复合物的x射线晶体结构上,以识别在空间上足够接近以构成分子间相互作用的AIP对。结合的总自由能是通过计算每个AIP接触和相关的脱溶的贡献之和来计算的。将该方法应用于CASF基准数据集中94个含不带电配体的配合物,得到的绝对结合自由能的Pearson相关系数为0.76,RMSD为11 kJ mol-1。
{"title":"Prediction of Protein–Ligand Binding Affinities Using Atomic Surface Site Interaction Points","authors":"Katarzyna J. Zator,&nbsp;, ,&nbsp;Maria Chiara Storer,&nbsp;, and ,&nbsp;Christopher A. Hunter*,&nbsp;","doi":"10.1021/acs.jcim.5c02628","DOIUrl":"10.1021/acs.jcim.5c02628","url":null,"abstract":"<p >Atom surface site Interaction Points (AIP) which were previously used to predict association constants for synthetic host–guest systems has been extended to protein–ligand complexes. AIP descriptions of protein binding sites were obtained by combining a library of precomputed AIP descriptors for all protein functional groups with a graph-based substructure matching algorithm. The corresponding AIP description of ligands was obtained directly by footprinting the molecular electrostatic potential surface calculated using density functional theory. These AIP descriptions were projected onto X-ray crystal structures of protein–ligand complexes to identify pairs of AIPs that were sufficiently close in space to constitute an intermolecular interaction. The overall free energy of binding was calculated by summing the contributions of each AIP contact and associated desolvation. Application to the 94 complexes involving uncharged ligands in CASF benchmark data set showed that the method achieves a Pearson correlation coefficient of 0.76 and an RMSD of 11 kJ mol<sup>–1</sup> for absolute free energies of binding.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"1097–1105"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02628","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Glycans Modulate the Adsorption of RBD Glycoproteins on Polarizable Surfaces. 聚糖调节RBD糖蛋白在极化表面的吸附。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02363
Antonio M Bosch-Fernández,Willy Menacho,Rubén Pérez,Horacio V Guzman
Numerous respiratory viruses are transmitted via airborne microdroplets that frequently adhere to fomites. Understanding the behavior of these phenomenologically rich bio-material interfaces remains an open issue. Here, we tackle the complex interplay between glycans and protein conformational dynamics during adsorption onto polarizable surfaces, focusing on the potential of glycans as molecular interaction modulators. We employ molecular dynamics simulations to dissect the interactions of the Receptor Binding Domain (RBD) glycoproteins from different SARS-CoV-2 variants of concern (VoC), in both open and closed conformations, with polarizable planar interfaces. Advanced analysis using 2D space reveals distinct adsorption mechanisms depending on the initial loci of the glycan within the protein wall. Hydrophobic surfaces facilitate stable adsorption for both RBD conformations. Conversely, hydrophilic surfaces exhibit reduced adsorption, particularly for the closed-RBD, where glycans predominantly form hydrogen bonds. Glycans significantly modulate closed-RBD adsorption, either enhancing it by permanent tethering or impeding it depending on the initial conformation and protein mutations (Omicron). Results for the individual RBDs are consistent with scaled-up simulations for the complete spike ectodomain glycoprotein. Our findings unveil novel glycan-mediated adsorption phenomena and provide fundamental insights into glycoprotein-surface interactions, paving the way for understanding glycan roles in glycoprotein-fomite adsorption, protein aggregation, and recognition at polarizable biological interfaces.
许多呼吸道病毒通过空气中的微飞沫传播,这些微飞沫经常附着在污染物上。了解这些现象丰富的生物材料界面的行为仍然是一个悬而未决的问题。在这里,我们解决了在极化表面吸附过程中聚糖和蛋白质构象动力学之间复杂的相互作用,重点关注聚糖作为分子相互作用调节剂的潜力。我们采用分子动力学模拟来解剖来自不同SARS-CoV-2关注变体(VoC)的受体结合域(RBD)糖蛋白在开放和封闭构象下与极化平面界面的相互作用。利用二维空间的高级分析揭示了不同的吸附机制,这取决于蛋白质壁内聚糖的初始位点。疏水表面有利于稳定吸附两种RBD构象。相反,亲水表面表现出减少的吸附,特别是对于封闭的rbd,其中聚糖主要形成氢键。聚糖显著调节封闭rbd吸附,根据初始构象和蛋白质突变,通过永久系住或阻碍它来增强它。单个rbd的结果与完全尖峰外结构域糖蛋白的放大模拟一致。我们的发现揭示了新的聚糖介导的吸附现象,为糖蛋白-表面相互作用提供了基本的见解,为理解聚糖在糖蛋白-表面吸附、蛋白质聚集和极化生物界面识别中的作用铺平了道路。
{"title":"Glycans Modulate the Adsorption of RBD Glycoproteins on Polarizable Surfaces.","authors":"Antonio M Bosch-Fernández,Willy Menacho,Rubén Pérez,Horacio V Guzman","doi":"10.1021/acs.jcim.5c02363","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02363","url":null,"abstract":"Numerous respiratory viruses are transmitted via airborne microdroplets that frequently adhere to fomites. Understanding the behavior of these phenomenologically rich bio-material interfaces remains an open issue. Here, we tackle the complex interplay between glycans and protein conformational dynamics during adsorption onto polarizable surfaces, focusing on the potential of glycans as molecular interaction modulators. We employ molecular dynamics simulations to dissect the interactions of the Receptor Binding Domain (RBD) glycoproteins from different SARS-CoV-2 variants of concern (VoC), in both open and closed conformations, with polarizable planar interfaces. Advanced analysis using 2D space reveals distinct adsorption mechanisms depending on the initial loci of the glycan within the protein wall. Hydrophobic surfaces facilitate stable adsorption for both RBD conformations. Conversely, hydrophilic surfaces exhibit reduced adsorption, particularly for the closed-RBD, where glycans predominantly form hydrogen bonds. Glycans significantly modulate closed-RBD adsorption, either enhancing it by permanent tethering or impeding it depending on the initial conformation and protein mutations (Omicron). Results for the individual RBDs are consistent with scaled-up simulations for the complete spike ectodomain glycoprotein. Our findings unveil novel glycan-mediated adsorption phenomena and provide fundamental insights into glycoprotein-surface interactions, paving the way for understanding glycan roles in glycoprotein-fomite adsorption, protein aggregation, and recognition at polarizable biological interfaces.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"261 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145968605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts 数据移位下分子机器学习属性预测的不确定性量化。
IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2026-01-14 DOI: 10.1021/acs.jcim.5c02381
Raquel Parrondo-Pizarro, , , Jessica Lanini, , and , Raquel Rodríguez-Pérez*, 

Drug discovery and medicinal chemistry efforts are increasingly influenced by machine learning (ML), with compound property prediction as a central application. ML models have demonstrated strong performance in predicting various compound properties from chemical structure. However, these models can exhibit varying levels of prediction error, making uncertainty quantification (UQ) essential for informed decisions. Standard UQ metrics include the distance to the molecules in the training set and prediction variance, obtained through methods such as model ensembles or Bayesian modeling. Although several UQ methodologies have been developed in recent years, no single approach consistently outperformed others. Herein, we present a comprehensive benchmark of UQ strategies for ML-based prediction of absorption, distribution, metabolism, and excretion (ADME) properties, using both in-house and public data sets. We employed the recently introduced UNIQUE (UNcertaInty QUantification bEnchmarking) framework and evaluated UQ method performance under data shifts. Our findings indicate data-based UQ metrics (e.g., chemical distance), and model-based UQ metrics (e.g., predicted value and variance) may capture complementary aspects of uncertainty. Their combination through error models, designed to predict the original ML model’s error, yielded higher-quality uncertainty estimates. These error models emerged as a promising strategy for enhancing UQ, showing robustness in under various degrees and types of data shift. Taken together, our work highlights the potential of combining diverse UQ metrics and error modeling to improve reliability in molecular property prediction. By establishing standardized evaluation setups and assessing UQ under data shifts, we provide a foundation for future UQ method development and benchmarking in the field.

药物发现和药物化学工作越来越多地受到机器学习(ML)的影响,其中化合物性质预测是中心应用。ML模型在从化学结构预测各种化合物性质方面表现出很强的性能。然而,这些模型可能表现出不同程度的预测误差,使得不确定性量化(UQ)对于明智的决策至关重要。标准UQ指标包括通过模型集成或贝叶斯建模等方法获得的与训练集中分子的距离和预测方差。虽然近年来开发了几种UQ方法,但没有一种方法始终优于其他方法。在此,我们使用内部和公共数据集,提出了基于ml的吸收、分布、代谢和排泄(ADME)特性预测的UQ策略的综合基准。我们采用了最近引入的UNIQUE(不确定性量化基准)框架,并评估了数据移位下UQ方法的性能。我们的研究结果表明,基于数据的UQ度量(例如,化学距离)和基于模型的UQ度量(例如,预测值和方差)可以捕获不确定性的互补方面。他们通过误差模型的组合,旨在预测原始机器学习模型的误差,产生更高质量的不确定性估计。这些误差模型是一种很有前途的提高UQ的策略,在不同程度和类型的数据移位下显示出鲁棒性。综上所述,我们的工作突出了结合不同UQ指标和误差建模来提高分子性质预测可靠性的潜力。通过建立标准化的评估设置和评估数据变化下的UQ,我们为未来UQ方法的开发和该领域的基准测试奠定了基础。
{"title":"Uncertainty Quantification in Molecular Machine Learning for Property Predictions under Data Shifts","authors":"Raquel Parrondo-Pizarro,&nbsp;, ,&nbsp;Jessica Lanini,&nbsp;, and ,&nbsp;Raquel Rodríguez-Pérez*,&nbsp;","doi":"10.1021/acs.jcim.5c02381","DOIUrl":"10.1021/acs.jcim.5c02381","url":null,"abstract":"<p >Drug discovery and medicinal chemistry efforts are increasingly influenced by machine learning (ML), with compound property prediction as a central application. ML models have demonstrated strong performance in predicting various compound properties from chemical structure. However, these models can exhibit varying levels of prediction error, making uncertainty quantification (UQ) essential for informed decisions. Standard UQ metrics include the distance to the molecules in the training set and prediction variance, obtained through methods such as model ensembles or Bayesian modeling. Although several UQ methodologies have been developed in recent years, no single approach consistently outperformed others. Herein, we present a comprehensive benchmark of UQ strategies for ML-based prediction of absorption, distribution, metabolism, and excretion (ADME) properties, using both in-house and public data sets. We employed the recently introduced UNIQUE (UNcertaInty QUantification bEnchmarking) framework and evaluated UQ method performance under data shifts. Our findings indicate data-based UQ metrics (e.g., chemical distance), and model-based UQ metrics (e.g., predicted value and variance) may capture complementary aspects of uncertainty. Their combination through error models, designed to predict the original ML model’s error, yielded higher-quality uncertainty estimates. These error models emerged as a promising strategy for enhancing UQ, showing robustness in under various degrees and types of data shift. Taken together, our work highlights the potential of combining diverse UQ metrics and error modeling to improve reliability in molecular property prediction. By establishing standardized evaluation setups and assessing UQ under data shifts, we provide a foundation for future UQ method development and benchmarking in the field.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 2","pages":"923–935"},"PeriodicalIF":5.3,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/pdf/10.1021/acs.jcim.5c02381","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145971786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemical Information and Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1