Pub Date : 2026-03-09DOI: 10.1021/acs.jcim.6c00060
Florence Szczepaniak,Donghyuk Suh,Wonpil Im
Recent advances in machine learning (ML) have enabled new developments in molecular dynamics simulation. Neural network potentials (NNPs) trained on quantum mechanical (QM) data provide highly accurate descriptions of drug-like molecules. Analogous to a QM and molecular mechanical (QM/MM) approach, hybrid ML/MM simulations employ NNPs to describe a localized region of the system, such as a ligand, while the rest of the system is treated using classical MM force fields. This hybrid framework enables simulations of protein-ligand complexes with near-QM accuracy for the ligand at a substantially reduced computational cost. CHARMM-GUI Hybrid ML/MM Builder automates the preparation of system and input files required for hybrid ML/MM modeling and simulation. This new module generates all necessary files to simulate protein-ligand complexes in solution or membrane using TorchANI-AMBER and OpenMM-ML. Currently supported NNPs include MACE and ANI. In this paper, we present Hybrid ML/MM Builder and representative application systems that demonstrate its usage and capabilities.
{"title":"CHARMM-GUI Hybrid ML/MM Builder for Hybrid Machine Learning and Molecular Mechanical Modeling and Simulations.","authors":"Florence Szczepaniak,Donghyuk Suh,Wonpil Im","doi":"10.1021/acs.jcim.6c00060","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00060","url":null,"abstract":"Recent advances in machine learning (ML) have enabled new developments in molecular dynamics simulation. Neural network potentials (NNPs) trained on quantum mechanical (QM) data provide highly accurate descriptions of drug-like molecules. Analogous to a QM and molecular mechanical (QM/MM) approach, hybrid ML/MM simulations employ NNPs to describe a localized region of the system, such as a ligand, while the rest of the system is treated using classical MM force fields. This hybrid framework enables simulations of protein-ligand complexes with near-QM accuracy for the ligand at a substantially reduced computational cost. CHARMM-GUI Hybrid ML/MM Builder automates the preparation of system and input files required for hybrid ML/MM modeling and simulation. This new module generates all necessary files to simulate protein-ligand complexes in solution or membrane using TorchANI-AMBER and OpenMM-ML. Currently supported NNPs include MACE and ANI. In this paper, we present Hybrid ML/MM Builder and representative application systems that demonstrate its usage and capabilities.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"6 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147381343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cell cycle relies on sequential activation of cyclin-dependent kinases (CDKs) by phase-specific cyclins. Previously, we proposed that their conformations and activation speed are tuned to the needs of their respective phases. We demonstrated this principle by using molecular dynamics simulations to evaluate the slower activation and catalytic kinetics of Cyclin-D/CDK4 during the long G1 phase compared to the rapid activation of Cyclin-E/CDK2 in the brief G1/S transition, and the higher intrinsic activity of Cyclin-D/CDK6 required for rapid hematopoietic cell proliferation. Here, we ask whether this principle also holds for subsequent cell cycle phases. We explore how the dynamic behavior of structurally similar Cyclin-E/CDK2, Cyclin-A/CDK2, and Cyclin-A/CDK1 controls their distinct tasks, and how the cell ensures that Cyclin-A/CDK2 and Cyclin-A/CDK1, which share the same allosteric effector Cyclin-A, avoid redundantly triggering S and M-phase events out of order. Through molecular dynamics simulations, we find that their functional differences relate to their distinct conformational energy landscapes and kinetic profiles. Unlike the plastic interface of CDK1 complexes, the Cyclin-E/CDK2 complex, governing the G1/S transition, is conformationally constrained by a stable interface and is less dependent on its catalytic outputs. In contrast, the high catalytic efficiency of Cyclin-A/CDK2 can support rapid phosphorylation of S phase replication factors, thereby preventing DNA rereplication through preorganization of the CDK2 DFG-motif. We translate our results to the clinic by proposing an innovative allosteric degrader strategy for selective Cyclin-E degradation. We further validate our design workflow by reproducing the ternary complex of a known CDK2 degrader, and applying this approach to model an allosteric degrader thereby establishing the structural parameters required to target this specific Cyclin-E/CDK2-cereblon conformational state.
细胞周期依赖于周期蛋白依赖性激酶(CDKs)的顺序激活。在此之前,我们提出了它们的构象和激活速度是根据各自相的需要而调整的。我们通过使用分子动力学模拟来证明这一原理,以评估与Cyclin-E/CDK2在短暂的G1/S过渡期间的快速激活相比,Cyclin-D/CDK4在长G1期的缓慢激活和催化动力学,以及Cyclin-D/CDK6在快速造血细胞增殖所需的更高的内在活性。在这里,我们问这一原则是否也适用于随后的细胞周期阶段。我们探讨了结构相似的Cyclin-E/CDK2、Cyclin-A/CDK2和Cyclin-A/CDK1的动态行为如何控制它们不同的任务,以及细胞如何确保具有相同变构效应的Cyclin-A - a /CDK2和Cyclin-A/CDK1避免无序地冗余触发S期和m期事件。通过分子动力学模拟,我们发现它们的功能差异与它们不同的构象能量景观和动力学剖面有关。与CDK1复合物的塑料界面不同,控制G1/S转变的cycline /CDK2复合物受稳定界面的构象约束,较少依赖于其催化输出。相反,Cyclin-A/CDK2的高催化效率可以支持S期复制因子的快速磷酸化,从而通过CDK2 dfg基序的预组织阻止DNA复制。我们通过提出一种创新的变构降解策略来选择性降解Cyclin-E,将我们的结果转化为临床应用。我们通过复制已知CDK2降解物的三元配合物来进一步验证我们的设计工作流程,并将这种方法应用于变构降解物的建模,从而建立针对这种特定Cyclin-E/CDK2-cereblon构象状态所需的结构参数。
{"title":"Cyclin-E/A/CDK1/2 Kinetic Landscapes Drive Cell Cycle Phase-Specific Progression and Guide Cyclin-E Degradation Strategy.","authors":"Wengang Zhang,Devin Bradburn,Yonglan Liu,Hyunbum Jang,Mardo Kõivomägi,Ruth Nussinov","doi":"10.1021/acs.jcim.6c00279","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00279","url":null,"abstract":"The cell cycle relies on sequential activation of cyclin-dependent kinases (CDKs) by phase-specific cyclins. Previously, we proposed that their conformations and activation speed are tuned to the needs of their respective phases. We demonstrated this principle by using molecular dynamics simulations to evaluate the slower activation and catalytic kinetics of Cyclin-D/CDK4 during the long G1 phase compared to the rapid activation of Cyclin-E/CDK2 in the brief G1/S transition, and the higher intrinsic activity of Cyclin-D/CDK6 required for rapid hematopoietic cell proliferation. Here, we ask whether this principle also holds for subsequent cell cycle phases. We explore how the dynamic behavior of structurally similar Cyclin-E/CDK2, Cyclin-A/CDK2, and Cyclin-A/CDK1 controls their distinct tasks, and how the cell ensures that Cyclin-A/CDK2 and Cyclin-A/CDK1, which share the same allosteric effector Cyclin-A, avoid redundantly triggering S and M-phase events out of order. Through molecular dynamics simulations, we find that their functional differences relate to their distinct conformational energy landscapes and kinetic profiles. Unlike the plastic interface of CDK1 complexes, the Cyclin-E/CDK2 complex, governing the G1/S transition, is conformationally constrained by a stable interface and is less dependent on its catalytic outputs. In contrast, the high catalytic efficiency of Cyclin-A/CDK2 can support rapid phosphorylation of S phase replication factors, thereby preventing DNA rereplication through preorganization of the CDK2 DFG-motif. We translate our results to the clinic by proposing an innovative allosteric degrader strategy for selective Cyclin-E degradation. We further validate our design workflow by reproducing the ternary complex of a known CDK2 degrader, and applying this approach to model an allosteric degrader thereby establishing the structural parameters required to target this specific Cyclin-E/CDK2-cereblon conformational state.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147381344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-09DOI: 10.1021/acs.jcim.6c00528
Alejandro Blanco-Gonzalez,William Betancourt,Ryan Michael Snyder,Shi Zhang,Timothy J. Giese,Zeke A. Piskulich,Andreas W. Götz,Kenneth M. Merz Jr.,Darrin M. York,Hasan Metin Aktulga,Madushanka Manathunga
General force fields such as General Amber Force Field (GAFF) have been designed for broad applicability and are widely used in protein–ligand binding simulations in structure-based drug discovery. However, the force field parameters are not always transferable across ligand molecules, and custom reparameterization is sometimes necessary for accurate binding free energy simulations. This is especially true for torsion parameters, which are highly dependent on stereoelectronic and steric effects. Here, we report a novel, flexible, and user-friendly computational tool called the Automated Force Field Developer and Optimizer (AFFDO) platform that allows generating accurate, tailored GAFF2 torsion parameters for drug-like molecules. For a given ligand, AFFDO selects the most important torsions, carries out GPU-accelerated density functional theory calculations to collect reference data and fits torsion terms using a fast gradient-based optimizer that leverages automated differentiation. We benchmark AFFDO by parametrizing a series of drug-like molecules and carrying out protein–ligand relative binding free energy (RBFE) simulations. The results show that AFFDO can significantly improve GAFF2 torsion parameters against QM reference data, which in some cases translates into better agreement with experimental RBFE values within a reasonable computational time.
通用力场(General Amber force Field, GAFF)具有广泛的适用性,被广泛应用于基于结构的药物发现中的蛋白质-配体结合模拟。然而,力场参数并不总是可以在配体分子之间传递,为了精确地模拟结合自由能,有时需要自定义重新参数化。对于高度依赖于立体电子效应和空间效应的扭转参数尤其如此。在这里,我们报告了一种新颖的、灵活的、用户友好的计算工具,称为自动化力场开发和优化器(AFFDO)平台,它允许为类药物分子生成准确的、定制的GAFF2扭转参数。对于给定的配体,AFFDO选择最重要的扭转,执行gpu加速的密度泛函理论计算以收集参考数据,并使用利用自动微分的快速梯度优化器拟合扭转项。我们通过参数化一系列药物样分子并进行蛋白质-配体相对结合自由能(RBFE)模拟来对AFFDO进行基准测试。结果表明,AFFDO可以显著提高GAFF2对QM参考数据的扭转参数,在某些情况下,在合理的计算时间内转化为与实验RBFE值更好的一致性。
{"title":"Automated Force Field Developer and Optimizer Platform: Torsion Reparameterization","authors":"Alejandro Blanco-Gonzalez,William Betancourt,Ryan Michael Snyder,Shi Zhang,Timothy J. Giese,Zeke A. Piskulich,Andreas W. Götz,Kenneth M. Merz Jr.,Darrin M. York,Hasan Metin Aktulga,Madushanka Manathunga","doi":"10.1021/acs.jcim.6c00528","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00528","url":null,"abstract":"General force fields such as General Amber Force Field (GAFF) have been designed for broad applicability and are widely used in protein–ligand binding simulations in structure-based drug discovery. However, the force field parameters are not always transferable across ligand molecules, and custom reparameterization is sometimes necessary for accurate binding free energy simulations. This is especially true for torsion parameters, which are highly dependent on stereoelectronic and steric effects. Here, we report a novel, flexible, and user-friendly computational tool called the Automated Force Field Developer and Optimizer (AFFDO) platform that allows generating accurate, tailored GAFF2 torsion parameters for drug-like molecules. For a given ligand, AFFDO selects the most important torsions, carries out GPU-accelerated density functional theory calculations to collect reference data and fits torsion terms using a fast gradient-based optimizer that leverages automated differentiation. We benchmark AFFDO by parametrizing a series of drug-like molecules and carrying out protein–ligand relative binding free energy (RBFE) simulations. The results show that AFFDO can significantly improve GAFF2 torsion parameters against QM reference data, which in some cases translates into better agreement with experimental RBFE values within a reasonable computational time.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"45 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147383804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1021/acs.jcim.5c02709
Peiyao Li,Lan Hua,Ye Liu,Jun Zhu
Deep learning has accelerated drug discovery by enabling large-scale virtual screening, but current models often act as "black boxes" and provide no formal guarantees about prediction reliability. This limitation is particularly critical for compound-protein interaction (CPI) prediction, where data sets are highly imbalanced and erroneous predictions can lead to costly failures. Here we introduce ConfBiXtCPI, an integrated framework that unifies accurate prediction, interpretability, and statistically rigorous uncertainty quantification. At its core is a bidirectional cross-attention transformer that captures molecular recognition patterns from sequence-level inputs, achieving state-of-the-art accuracy across multiple benchmarks. To address class imbalance and uncertainty, we incorporate Mondrian conformal prediction, which guarantees valid coverage for both majority and minority classes. Building on this, a conformal selection procedure enables principled control of the false discovery rate, allowing users to specify risk thresholds while maintaining discovery power. Beyond accuracy, ConfBiXtCPI provides mechanistic interpretability through attention maps that localize to biophysically relevant binding sites, and its uncertainty estimates support efficient active learning strategies. Together, these advances establish ConfBiXtCPI as a trustworthy and practical tool for guiding experimental validation and accelerating therapeutic discovery.
{"title":"Trustworthy Compound-Protein Interaction Prediction with Interpretable and Conformalized Cross-Attention Transformers.","authors":"Peiyao Li,Lan Hua,Ye Liu,Jun Zhu","doi":"10.1021/acs.jcim.5c02709","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02709","url":null,"abstract":"Deep learning has accelerated drug discovery by enabling large-scale virtual screening, but current models often act as \"black boxes\" and provide no formal guarantees about prediction reliability. This limitation is particularly critical for compound-protein interaction (CPI) prediction, where data sets are highly imbalanced and erroneous predictions can lead to costly failures. Here we introduce ConfBiXtCPI, an integrated framework that unifies accurate prediction, interpretability, and statistically rigorous uncertainty quantification. At its core is a bidirectional cross-attention transformer that captures molecular recognition patterns from sequence-level inputs, achieving state-of-the-art accuracy across multiple benchmarks. To address class imbalance and uncertainty, we incorporate Mondrian conformal prediction, which guarantees valid coverage for both majority and minority classes. Building on this, a conformal selection procedure enables principled control of the false discovery rate, allowing users to specify risk thresholds while maintaining discovery power. Beyond accuracy, ConfBiXtCPI provides mechanistic interpretability through attention maps that localize to biophysically relevant binding sites, and its uncertainty estimates support efficient active learning strategies. Together, these advances establish ConfBiXtCPI as a trustworthy and practical tool for guiding experimental validation and accelerating therapeutic discovery.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"110 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1021/acs.jcim.5c02959
Wei Lin,Chi Chung Alan Fung
Accurate prediction of acute dermal toxicity is vital for the safe and effective development of contact drugs. While numerous deep learning models have been created to replace costly and ethically challenging animal toxicity tests, most approaches overlook the multiview information on molecules. To overcome this limitation, we introduce a novel model named MVIToxNet, which integrates multiview features from both molecular fingerprints and SMILES sequences. To capture the multiview information on SMILES, MVIToxNet incorporates character-level and atom-level features. In addition, byte-pair encoding tokenization is utilized to capture substructural details within molecules, allowing the model to differentiate similar SMILES by assigning distinct tokens to different substructures. Since the data sets in this study are small and imbalanced, we argue that selecting a single model based solely on the best validation performance may not reliably reflect the best generalization for test sets. Therefore, we propose a weighted model averaging approach that combines multiple trained models according to their top-K validation scores into one model, yielding an improved model for inference. Extensive experimental results demonstrate that MVIToxNet significantly outperforms existing baselines in acute dermal toxicity prediction, validating the effectiveness of utilizing multiview features and the weighted model averaging strategy. Furthermore, our proposed methods demonstrate the potential for data-driven model design.
{"title":"Integrating Multiview Information for Enhanced Deep Learning-Based Acute Dermal Toxicity Prediction.","authors":"Wei Lin,Chi Chung Alan Fung","doi":"10.1021/acs.jcim.5c02959","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02959","url":null,"abstract":"Accurate prediction of acute dermal toxicity is vital for the safe and effective development of contact drugs. While numerous deep learning models have been created to replace costly and ethically challenging animal toxicity tests, most approaches overlook the multiview information on molecules. To overcome this limitation, we introduce a novel model named MVIToxNet, which integrates multiview features from both molecular fingerprints and SMILES sequences. To capture the multiview information on SMILES, MVIToxNet incorporates character-level and atom-level features. In addition, byte-pair encoding tokenization is utilized to capture substructural details within molecules, allowing the model to differentiate similar SMILES by assigning distinct tokens to different substructures. Since the data sets in this study are small and imbalanced, we argue that selecting a single model based solely on the best validation performance may not reliably reflect the best generalization for test sets. Therefore, we propose a weighted model averaging approach that combines multiple trained models according to their top-K validation scores into one model, yielding an improved model for inference. Extensive experimental results demonstrate that MVIToxNet significantly outperforms existing baselines in acute dermal toxicity prediction, validating the effectiveness of utilizing multiview features and the weighted model averaging strategy. Furthermore, our proposed methods demonstrate the potential for data-driven model design.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"3 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147359308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid growth of high-throughput sequencing data, many proteins remain uncharacterized, while experimental validation is costly and time-consuming. Automatic Function Prediction (AFP) is thus urgently needed. Protein functions are complex and multilevel, with inherent interactions among features such as sequence, structure, and evolution. Existing methods relying on single-level representations or simple feature aggregation struggle to capture the hierarchical dependencies and semantic collaborative relationships in the Gene Ontology (GO) label system, limiting prediction accuracy and generalization. To overcome these challenges, we propose a Multi-View Collaboration Feature Fusion (MVCFF) framework, which leverages complementary features from multiple sequence perspectives to enhance protein function prediction. In MVCFF, a sequential feature extraction subnetwork is designed to capture view-specific information, incorporating both local patterns and long-range dependencies within amino acid sequences. Building on this, a multi-view collaboration paradigm is employed, enabling interactive learning of key positional information through integrated multi-view features and facilitating synergistic information fusion. The resulting multi-view representations are then fed into downstream label predictors to perform classification tasks. To further boost predictive accuracy, we introduce an extended version, MVCFF+, which combines the original MVCFF framework with sequence-similarity-based prediction methods via a weighted fusion strategy. Extensive experiments demonstrate that our approach substantially improves prediction performance, outperforming existing methods by a clear margin. The source code is publicly available at https://github.com/AGI-FBHC/MVCFF.
{"title":"Multi-View Collaboration Feature Fusion for Protein Function Prediction.","authors":"Hailong Yang,Zhongyu Wang,Haijun Shi,Qiao Ning,Zhaohong Deng,Shudong Hu,Yanqi Zhong","doi":"10.1021/acs.jcim.5c03057","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03057","url":null,"abstract":"With the rapid growth of high-throughput sequencing data, many proteins remain uncharacterized, while experimental validation is costly and time-consuming. Automatic Function Prediction (AFP) is thus urgently needed. Protein functions are complex and multilevel, with inherent interactions among features such as sequence, structure, and evolution. Existing methods relying on single-level representations or simple feature aggregation struggle to capture the hierarchical dependencies and semantic collaborative relationships in the Gene Ontology (GO) label system, limiting prediction accuracy and generalization. To overcome these challenges, we propose a Multi-View Collaboration Feature Fusion (MVCFF) framework, which leverages complementary features from multiple sequence perspectives to enhance protein function prediction. In MVCFF, a sequential feature extraction subnetwork is designed to capture view-specific information, incorporating both local patterns and long-range dependencies within amino acid sequences. Building on this, a multi-view collaboration paradigm is employed, enabling interactive learning of key positional information through integrated multi-view features and facilitating synergistic information fusion. The resulting multi-view representations are then fed into downstream label predictors to perform classification tasks. To further boost predictive accuracy, we introduce an extended version, MVCFF+, which combines the original MVCFF framework with sequence-similarity-based prediction methods via a weighted fusion strategy. Extensive experiments demonstrate that our approach substantially improves prediction performance, outperforming existing methods by a clear margin. The source code is publicly available at https://github.com/AGI-FBHC/MVCFF.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"67 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147359156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current enzyme thermostability prediction models are predominantly designed for cross-family generalization, with limited focus on hydroxylases, which restricts their accuracy and applicability in hydroxylase-specific thermostability design. In this study, we develop HyS-BST, a dedicated self-trained semisupervised framework for hydroxylase thermostability prediction. Leveraging a limited hydroxylase data set, HyS-BST integrates a self-training strategy with Bayesian dynamic tuning to achieve high-precision prediction of mutant thermostability in terms of ΔΔG. Experimental results demonstrate that after only ten training iterations, HyS-BST attains a coefficient of determination (R2) of 0.96, a Pearson correlation coefficient (PCC) of 0.98, and a root mean squared error (RMSE) as low as 0.06 on the test set. Compared with the optimal cross-family generalization model, HyS-BST improves PCC and RMSE by approximately 70%. Overall, this framework provides a specialized, efficient, and cost-effective solution for hydroxylase thermostability prediction, substantially reducing the candidate search space and experimental resources required for downstream validation.
{"title":"Hydroxylase Thermostability Prediction Based on Self-Trained Semisupervised Iteration and Bayesian Dynamic Tuning.","authors":"Sujuan Liu,Mengyu Yu,Lei Zhang,Dongyan Wen,Xiaotong Yu,Jianmei Luo,Chuanlei Zhang","doi":"10.1021/acs.jcim.6c00102","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00102","url":null,"abstract":"Current enzyme thermostability prediction models are predominantly designed for cross-family generalization, with limited focus on hydroxylases, which restricts their accuracy and applicability in hydroxylase-specific thermostability design. In this study, we develop HyS-BST, a dedicated self-trained semisupervised framework for hydroxylase thermostability prediction. Leveraging a limited hydroxylase data set, HyS-BST integrates a self-training strategy with Bayesian dynamic tuning to achieve high-precision prediction of mutant thermostability in terms of ΔΔG. Experimental results demonstrate that after only ten training iterations, HyS-BST attains a coefficient of determination (R2) of 0.96, a Pearson correlation coefficient (PCC) of 0.98, and a root mean squared error (RMSE) as low as 0.06 on the test set. Compared with the optimal cross-family generalization model, HyS-BST improves PCC and RMSE by approximately 70%. Overall, this framework provides a specialized, efficient, and cost-effective solution for hydroxylase thermostability prediction, substantially reducing the candidate search space and experimental resources required for downstream validation.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"15 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147351174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-05DOI: 10.1021/acs.jcim.6c00390
Shiva Ghaemi,Amanda Consylman,Bo Pan,Alice Wu,Ashley Petersen,Gabe Chang,Diana McDonough,Mark Forman,Elise L Bezold,William M Wuest,Amarda Shehu,Liang Zhao,Kevin P C Minbiole
Quaternary ammonium compounds (QACs) are widely used antimicrobial disinfectants whose efficacy is threatened by increased bacterial resistance. Artificial intelligence-guided development of novel QACs is constrained by historically sparse structure-activity data and methods to generate novel chemical entities with bioactivity. This paper presents a comparative experimental study of two computational workflows designed to accelerate QAC discovery under data-limited conditions. Both workflows employ a topology-aware variational autoencoder to generate novel candidates. In Workflow 1, generated QAC structures were directly subjected to expert evaluation within a fixed time constraint through the systematic application of chemistry-domain decision criteria. In Workflow 2, generated candidates were first computationally filtered using predictive models trained to anticipate antimicrobial activity, advancing only molecules projected to be highly active against at least one bacterial strain for expert evaluation. This predictive filtering enabled the assessment of a larger, higher-quality candidate pool within the same time constraint. Comparative assessment of the compound sets resulting from the two workflows revealed substantial improvements in candidate quality: compounds deemed synthesis-worthy increased from 9% to 38%, while invalid outputs decreased from 21% to 0%. Experimental characterization of 29 selected compounds across both workflows yielded 11 novel QACs with experimentally validated minimum inhibitory concentrations of 1-32 μM against four bacterial pathogens. These results demonstrate that topology-aware generation coupled with computational prefiltering enables systematic navigation of data-scarce chemical spaces while respecting practical constraints on expert evaluation time.
{"title":"Topology-Aware Generation and Activity-Based Filtering: A Computational-Experimental Framework for Data-Scarce Quaternary Ammonium Compound Discovery.","authors":"Shiva Ghaemi,Amanda Consylman,Bo Pan,Alice Wu,Ashley Petersen,Gabe Chang,Diana McDonough,Mark Forman,Elise L Bezold,William M Wuest,Amarda Shehu,Liang Zhao,Kevin P C Minbiole","doi":"10.1021/acs.jcim.6c00390","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00390","url":null,"abstract":"Quaternary ammonium compounds (QACs) are widely used antimicrobial disinfectants whose efficacy is threatened by increased bacterial resistance. Artificial intelligence-guided development of novel QACs is constrained by historically sparse structure-activity data and methods to generate novel chemical entities with bioactivity. This paper presents a comparative experimental study of two computational workflows designed to accelerate QAC discovery under data-limited conditions. Both workflows employ a topology-aware variational autoencoder to generate novel candidates. In Workflow 1, generated QAC structures were directly subjected to expert evaluation within a fixed time constraint through the systematic application of chemistry-domain decision criteria. In Workflow 2, generated candidates were first computationally filtered using predictive models trained to anticipate antimicrobial activity, advancing only molecules projected to be highly active against at least one bacterial strain for expert evaluation. This predictive filtering enabled the assessment of a larger, higher-quality candidate pool within the same time constraint. Comparative assessment of the compound sets resulting from the two workflows revealed substantial improvements in candidate quality: compounds deemed synthesis-worthy increased from 9% to 38%, while invalid outputs decreased from 21% to 0%. Experimental characterization of 29 selected compounds across both workflows yielded 11 novel QACs with experimentally validated minimum inhibitory concentrations of 1-32 μM against four bacterial pathogens. These results demonstrate that topology-aware generation coupled with computational prefiltering enables systematic navigation of data-scarce chemical spaces while respecting practical constraints on expert evaluation time.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"24 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147359157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-05DOI: 10.1021/acs.jcim.5c02819
Daniele Belletto,Stefano Scoditti,Stefano Borocci,Nico Sanna,Costantino Zazza,Emilia Sicilia
The efficacy of platinum(II) drugs, despite their wide use in clinical practice, is seriously limited by their well-known drawbacks. Octahedral Pt(IV) congeners are considered a sort of Holy Grail in cancer research as, being significantly more inert, they should be able to overcome the limitations of current platinum-based drugs, such as resistance and side effects, acting as prodrugs. Additionally, their anticancer activity can be tuned through a proper choice of the axial ligands released inside cancer cells when these compounds are reduced, making them even capable of potentially working as multiaction agents. However, despite their very satisfactory anticancer effects, no Pt(IV) complex has been approved for clinical use. As cell membrane permeation is the critical step, very poorly understood, of the whole mechanism of action of any drug, the investigation of the eventual differences in behavior between four-coordinate Pt(II) and six-coordinate Pt(IV) complexes when they diffuse in a lipid bilayer might be of significant relevance. The outcomes of a biased molecular dynamics (MD) investigation of the permeation of cisplatin and three simple cisplatin Pt(IV) derivatives through a membrane model prototype of human breast cancer cells are illustrated here. This comparative analysis of Pt(II) and Pt(IV) complex passive diffusion has been carried out with the aim of gaining indications about the factors that play a role in favoring or hindering membrane penetration and, ultimately, in determining the efficacy of their anticancer action.
{"title":"Molecular Dynamics Simulation of Passive Diffusion across a Human Breast Cancer Cell Membrane Model. Comparison between Cisplatin and Its Pt(IV) Derivatives.","authors":"Daniele Belletto,Stefano Scoditti,Stefano Borocci,Nico Sanna,Costantino Zazza,Emilia Sicilia","doi":"10.1021/acs.jcim.5c02819","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02819","url":null,"abstract":"The efficacy of platinum(II) drugs, despite their wide use in clinical practice, is seriously limited by their well-known drawbacks. Octahedral Pt(IV) congeners are considered a sort of Holy Grail in cancer research as, being significantly more inert, they should be able to overcome the limitations of current platinum-based drugs, such as resistance and side effects, acting as prodrugs. Additionally, their anticancer activity can be tuned through a proper choice of the axial ligands released inside cancer cells when these compounds are reduced, making them even capable of potentially working as multiaction agents. However, despite their very satisfactory anticancer effects, no Pt(IV) complex has been approved for clinical use. As cell membrane permeation is the critical step, very poorly understood, of the whole mechanism of action of any drug, the investigation of the eventual differences in behavior between four-coordinate Pt(II) and six-coordinate Pt(IV) complexes when they diffuse in a lipid bilayer might be of significant relevance. The outcomes of a biased molecular dynamics (MD) investigation of the permeation of cisplatin and three simple cisplatin Pt(IV) derivatives through a membrane model prototype of human breast cancer cells are illustrated here. This comparative analysis of Pt(II) and Pt(IV) complex passive diffusion has been carried out with the aim of gaining indications about the factors that play a role in favoring or hindering membrane penetration and, ultimately, in determining the efficacy of their anticancer action.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"3 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147359155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I-motifs are noncanonical DNA secondary structures stabilized by hemiprotonated C+:C base pairs. Their intrinsic flexibility, conformational heterogeneity, and sensitivity to environmental conditions often hinder structural characterization. Here, all-atom simulations, combined with biophysical experiments, were used to characterize the structure of the i-motif monomer formed by the HRAS gene promoter (iHRAS), a member of the RAS proto-oncogene family. Our results reveal that iHRAS exhibits intricate conformational behavior characterized by multiple interconverting states. The core i-motif is stabilized by a protective G:G cap, a recurrent i-motif-stabilizing factor, on one side, while the C+:C base pairs content on the other side is variable. Structural heterogeneity is most pronounced in loops, which sample several base-exposed states aided by K+ ion binding. These findings contribute to a deeper understanding of the i-motif structure and dynamics.
{"title":"Combined All-Atom Simulations and Biophysical Assays Uncover Loop-Driven Stabilization in the HRAS i-motif","authors":"Alhadji Malloum,Valentina Arciuolo,Pavlína Pokorná,Luca Grisanti,Bruno Pagano,Jussara Amato,Alessandra Magistrato","doi":"10.1021/acs.jcim.5c03176","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03176","url":null,"abstract":"I-motifs are noncanonical DNA secondary structures stabilized by hemiprotonated C+:C base pairs. Their intrinsic flexibility, conformational heterogeneity, and sensitivity to environmental conditions often hinder structural characterization. Here, all-atom simulations, combined with biophysical experiments, were used to characterize the structure of the i-motif monomer formed by the HRAS gene promoter (iHRAS), a member of the RAS proto-oncogene family. Our results reveal that iHRAS exhibits intricate conformational behavior characterized by multiple interconverting states. The core i-motif is stabilized by a protective G:G cap, a recurrent i-motif-stabilizing factor, on one side, while the C+:C base pairs content on the other side is variable. Structural heterogeneity is most pronounced in loops, which sample several base-exposed states aided by K+ ion binding. These findings contribute to a deeper understanding of the i-motif structure and dynamics.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"402 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147346717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}