Pub Date : 2026-03-09DOI: 10.1021/acs.jcim.5c02998
Måns K Rosenbaum,David van der Spoel
Molecular simulation tools, such as GROMACS, are used routinely to produce time series of energies and other observables. To turn these data into publication-quality figures, a user can either use a (commercial) software package with a graphical user interface, often offering fine control and high-quality output, or write their own code to make plots using a scripting language. In the age of big data and machine learning, it is often necessary to generate many graphs, be able to rapidly inspect them, and make plots for manuscripts. Here, we provide a simple Python tool, plotXVG, built on the well-known Matplotlib plotting library, that will generate publication-quality graphics for line graphs as well as heatmaps and contour plots. This will allow users to rapidly and reproducibly generate a series of graphics files without programming, but a simple application programming interface is available as well for incorporation in, e.g., machine learning applications. Obviously, the tool is applicable to any kind of line graph data or heatmap, not just that from molecular simulations. plotXVG is available as free and open source, which implies that users can extend the tool to their own needs.
{"title":"plotXVG: Batch Generation of Publication-Quality Graphs from GROMACS Output.","authors":"Måns K Rosenbaum,David van der Spoel","doi":"10.1021/acs.jcim.5c02998","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02998","url":null,"abstract":"Molecular simulation tools, such as GROMACS, are used routinely to produce time series of energies and other observables. To turn these data into publication-quality figures, a user can either use a (commercial) software package with a graphical user interface, often offering fine control and high-quality output, or write their own code to make plots using a scripting language. In the age of big data and machine learning, it is often necessary to generate many graphs, be able to rapidly inspect them, and make plots for manuscripts. Here, we provide a simple Python tool, plotXVG, built on the well-known Matplotlib plotting library, that will generate publication-quality graphics for line graphs as well as heatmaps and contour plots. This will allow users to rapidly and reproducibly generate a series of graphics files without programming, but a simple application programming interface is available as well for incorporation in, e.g., machine learning applications. Obviously, the tool is applicable to any kind of line graph data or heatmap, not just that from molecular simulations. plotXVG is available as free and open source, which implies that users can extend the tool to their own needs.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"193 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147381340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-09DOI: 10.1021/acs.jcim.5c02826
Haotian Guan,Tian Bai,Chuande Yang,Tao Zhang,Han Wang,Guishen Wang
Accurately predicting drug-target interactions (DTIs) is crucial for drug discovery, repositioning. However, most deep learning-based DTI models are designed in Euclidean space, making it difficult to effectively represent the hierarchical and scale-free characteristics of biological data. Due to its unique negatively curved geometric properties, hyperbolic space can more effectively represent hierarchical relationships within data. Therefore, we propose a multimanifold learning framework that integrates multimodal features in hyperbolic and Euclidean spaces for drug-target interaction prediction. Specifically, we employ a Hyperbolic Graph Neural Network (HGNN) to extract features from molecular graphs of small-molecular drugs, thereby effectively capturing the hierarchical structural information within these graphs. To integrate heterogeneous information, a Multi-Manifold Feature Fusion Module combines structural features from the HGNN, chemical fingerprints, and semantic embeddings derived from pretrained language models. Extensive experiments on benchmark data sets demonstrate that our framework achieves superior performance compared with state-of-the-art Euclidean-based methods. The experimental results demonstrate that hyperbolic geometry offers significant advantages in extracting hierarchical features from non-Euclidean data and also highlight the promising potential of multimanifold feature fusion in the field of drug-target interaction prediction.
{"title":"MML-DTI: Multimanifold Learning with Hyperbolic Graph Neural Networks for Enhanced Drug-Target Interaction Prediction.","authors":"Haotian Guan,Tian Bai,Chuande Yang,Tao Zhang,Han Wang,Guishen Wang","doi":"10.1021/acs.jcim.5c02826","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02826","url":null,"abstract":"Accurately predicting drug-target interactions (DTIs) is crucial for drug discovery, repositioning. However, most deep learning-based DTI models are designed in Euclidean space, making it difficult to effectively represent the hierarchical and scale-free characteristics of biological data. Due to its unique negatively curved geometric properties, hyperbolic space can more effectively represent hierarchical relationships within data. Therefore, we propose a multimanifold learning framework that integrates multimodal features in hyperbolic and Euclidean spaces for drug-target interaction prediction. Specifically, we employ a Hyperbolic Graph Neural Network (HGNN) to extract features from molecular graphs of small-molecular drugs, thereby effectively capturing the hierarchical structural information within these graphs. To integrate heterogeneous information, a Multi-Manifold Feature Fusion Module combines structural features from the HGNN, chemical fingerprints, and semantic embeddings derived from pretrained language models. Extensive experiments on benchmark data sets demonstrate that our framework achieves superior performance compared with state-of-the-art Euclidean-based methods. The experimental results demonstrate that hyperbolic geometry offers significant advantages in extracting hierarchical features from non-Euclidean data and also highlight the promising potential of multimanifold feature fusion in the field of drug-target interaction prediction.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"37 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147381341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-09DOI: 10.1021/acs.jcim.5c02840
Shivam Gupta,Taraknath Mandal
CC chemokine receptor type 5 (CCR5) functions as a key coreceptor facilitating HIV entry into host cells. Recent experimental findings suggest that CCR5 preferentially localizes at lipid domain boundaries within the host cell membrane, where its positioning enhances viral fusion efficiency by allowing the HIV fusion peptide gp41 to exploit the mechanically weaker interface regions. In this study, we employ coarse-grained molecular dynamics simulations to investigate the spatial organization of CCR5 within domain forming model membranes. Our results reveal a molecular mechanism by which CCR5 preferentially migrates and stabilizes at domain boundaries. Additionally, we show that lysophosphatidylcholine (lysoPC) lipids, acting as linactants, accumulate at domain interfaces, reduce line tension, and ultimately disrupt membrane domain organization. This disruption leads to a delocalization of CCR5, potentially impairing the ability of gp41 to target membrane boundaries for fusion. Together, our findings suggest that linactants may be employed to disrupt the spatial organization of CCR5, potentially hindering HIV's ability to initiate membrane fusion and entry.
{"title":"Controlling Spatial Organization of HIV Coreceptor CCR5.","authors":"Shivam Gupta,Taraknath Mandal","doi":"10.1021/acs.jcim.5c02840","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02840","url":null,"abstract":"CC chemokine receptor type 5 (CCR5) functions as a key coreceptor facilitating HIV entry into host cells. Recent experimental findings suggest that CCR5 preferentially localizes at lipid domain boundaries within the host cell membrane, where its positioning enhances viral fusion efficiency by allowing the HIV fusion peptide gp41 to exploit the mechanically weaker interface regions. In this study, we employ coarse-grained molecular dynamics simulations to investigate the spatial organization of CCR5 within domain forming model membranes. Our results reveal a molecular mechanism by which CCR5 preferentially migrates and stabilizes at domain boundaries. Additionally, we show that lysophosphatidylcholine (lysoPC) lipids, acting as linactants, accumulate at domain interfaces, reduce line tension, and ultimately disrupt membrane domain organization. This disruption leads to a delocalization of CCR5, potentially impairing the ability of gp41 to target membrane boundaries for fusion. Together, our findings suggest that linactants may be employed to disrupt the spatial organization of CCR5, potentially hindering HIV's ability to initiate membrane fusion and entry.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"66 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147381342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-09DOI: 10.1021/acs.jcim.6c00060
Florence Szczepaniak,Donghyuk Suh,Wonpil Im
Recent advances in machine learning (ML) have enabled new developments in molecular dynamics simulation. Neural network potentials (NNPs) trained on quantum mechanical (QM) data provide highly accurate descriptions of drug-like molecules. Analogous to a QM and molecular mechanical (QM/MM) approach, hybrid ML/MM simulations employ NNPs to describe a localized region of the system, such as a ligand, while the rest of the system is treated using classical MM force fields. This hybrid framework enables simulations of protein-ligand complexes with near-QM accuracy for the ligand at a substantially reduced computational cost. CHARMM-GUI Hybrid ML/MM Builder automates the preparation of system and input files required for hybrid ML/MM modeling and simulation. This new module generates all necessary files to simulate protein-ligand complexes in solution or membrane using TorchANI-AMBER and OpenMM-ML. Currently supported NNPs include MACE and ANI. In this paper, we present Hybrid ML/MM Builder and representative application systems that demonstrate its usage and capabilities.
{"title":"CHARMM-GUI Hybrid ML/MM Builder for Hybrid Machine Learning and Molecular Mechanical Modeling and Simulations.","authors":"Florence Szczepaniak,Donghyuk Suh,Wonpil Im","doi":"10.1021/acs.jcim.6c00060","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00060","url":null,"abstract":"Recent advances in machine learning (ML) have enabled new developments in molecular dynamics simulation. Neural network potentials (NNPs) trained on quantum mechanical (QM) data provide highly accurate descriptions of drug-like molecules. Analogous to a QM and molecular mechanical (QM/MM) approach, hybrid ML/MM simulations employ NNPs to describe a localized region of the system, such as a ligand, while the rest of the system is treated using classical MM force fields. This hybrid framework enables simulations of protein-ligand complexes with near-QM accuracy for the ligand at a substantially reduced computational cost. CHARMM-GUI Hybrid ML/MM Builder automates the preparation of system and input files required for hybrid ML/MM modeling and simulation. This new module generates all necessary files to simulate protein-ligand complexes in solution or membrane using TorchANI-AMBER and OpenMM-ML. Currently supported NNPs include MACE and ANI. In this paper, we present Hybrid ML/MM Builder and representative application systems that demonstrate its usage and capabilities.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"6 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147381343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The cell cycle relies on sequential activation of cyclin-dependent kinases (CDKs) by phase-specific cyclins. Previously, we proposed that their conformations and activation speed are tuned to the needs of their respective phases. We demonstrated this principle by using molecular dynamics simulations to evaluate the slower activation and catalytic kinetics of Cyclin-D/CDK4 during the long G1 phase compared to the rapid activation of Cyclin-E/CDK2 in the brief G1/S transition, and the higher intrinsic activity of Cyclin-D/CDK6 required for rapid hematopoietic cell proliferation. Here, we ask whether this principle also holds for subsequent cell cycle phases. We explore how the dynamic behavior of structurally similar Cyclin-E/CDK2, Cyclin-A/CDK2, and Cyclin-A/CDK1 controls their distinct tasks, and how the cell ensures that Cyclin-A/CDK2 and Cyclin-A/CDK1, which share the same allosteric effector Cyclin-A, avoid redundantly triggering S and M-phase events out of order. Through molecular dynamics simulations, we find that their functional differences relate to their distinct conformational energy landscapes and kinetic profiles. Unlike the plastic interface of CDK1 complexes, the Cyclin-E/CDK2 complex, governing the G1/S transition, is conformationally constrained by a stable interface and is less dependent on its catalytic outputs. In contrast, the high catalytic efficiency of Cyclin-A/CDK2 can support rapid phosphorylation of S phase replication factors, thereby preventing DNA rereplication through preorganization of the CDK2 DFG-motif. We translate our results to the clinic by proposing an innovative allosteric degrader strategy for selective Cyclin-E degradation. We further validate our design workflow by reproducing the ternary complex of a known CDK2 degrader, and applying this approach to model an allosteric degrader thereby establishing the structural parameters required to target this specific Cyclin-E/CDK2-cereblon conformational state.
细胞周期依赖于周期蛋白依赖性激酶(CDKs)的顺序激活。在此之前,我们提出了它们的构象和激活速度是根据各自相的需要而调整的。我们通过使用分子动力学模拟来证明这一原理,以评估与Cyclin-E/CDK2在短暂的G1/S过渡期间的快速激活相比,Cyclin-D/CDK4在长G1期的缓慢激活和催化动力学,以及Cyclin-D/CDK6在快速造血细胞增殖所需的更高的内在活性。在这里,我们问这一原则是否也适用于随后的细胞周期阶段。我们探讨了结构相似的Cyclin-E/CDK2、Cyclin-A/CDK2和Cyclin-A/CDK1的动态行为如何控制它们不同的任务,以及细胞如何确保具有相同变构效应的Cyclin-A - a /CDK2和Cyclin-A/CDK1避免无序地冗余触发S期和m期事件。通过分子动力学模拟,我们发现它们的功能差异与它们不同的构象能量景观和动力学剖面有关。与CDK1复合物的塑料界面不同,控制G1/S转变的cycline /CDK2复合物受稳定界面的构象约束,较少依赖于其催化输出。相反,Cyclin-A/CDK2的高催化效率可以支持S期复制因子的快速磷酸化,从而通过CDK2 dfg基序的预组织阻止DNA复制。我们通过提出一种创新的变构降解策略来选择性降解Cyclin-E,将我们的结果转化为临床应用。我们通过复制已知CDK2降解物的三元配合物来进一步验证我们的设计工作流程,并将这种方法应用于变构降解物的建模,从而建立针对这种特定Cyclin-E/CDK2-cereblon构象状态所需的结构参数。
{"title":"Cyclin-E/A/CDK1/2 Kinetic Landscapes Drive Cell Cycle Phase-Specific Progression and Guide Cyclin-E Degradation Strategy.","authors":"Wengang Zhang,Devin Bradburn,Yonglan Liu,Hyunbum Jang,Mardo Kõivomägi,Ruth Nussinov","doi":"10.1021/acs.jcim.6c00279","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00279","url":null,"abstract":"The cell cycle relies on sequential activation of cyclin-dependent kinases (CDKs) by phase-specific cyclins. Previously, we proposed that their conformations and activation speed are tuned to the needs of their respective phases. We demonstrated this principle by using molecular dynamics simulations to evaluate the slower activation and catalytic kinetics of Cyclin-D/CDK4 during the long G1 phase compared to the rapid activation of Cyclin-E/CDK2 in the brief G1/S transition, and the higher intrinsic activity of Cyclin-D/CDK6 required for rapid hematopoietic cell proliferation. Here, we ask whether this principle also holds for subsequent cell cycle phases. We explore how the dynamic behavior of structurally similar Cyclin-E/CDK2, Cyclin-A/CDK2, and Cyclin-A/CDK1 controls their distinct tasks, and how the cell ensures that Cyclin-A/CDK2 and Cyclin-A/CDK1, which share the same allosteric effector Cyclin-A, avoid redundantly triggering S and M-phase events out of order. Through molecular dynamics simulations, we find that their functional differences relate to their distinct conformational energy landscapes and kinetic profiles. Unlike the plastic interface of CDK1 complexes, the Cyclin-E/CDK2 complex, governing the G1/S transition, is conformationally constrained by a stable interface and is less dependent on its catalytic outputs. In contrast, the high catalytic efficiency of Cyclin-A/CDK2 can support rapid phosphorylation of S phase replication factors, thereby preventing DNA rereplication through preorganization of the CDK2 DFG-motif. We translate our results to the clinic by proposing an innovative allosteric degrader strategy for selective Cyclin-E degradation. We further validate our design workflow by reproducing the ternary complex of a known CDK2 degrader, and applying this approach to model an allosteric degrader thereby establishing the structural parameters required to target this specific Cyclin-E/CDK2-cereblon conformational state.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147381344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-09DOI: 10.1021/acs.jcim.6c00528
Alejandro Blanco-Gonzalez,William Betancourt,Ryan Michael Snyder,Shi Zhang,Timothy J. Giese,Zeke A. Piskulich,Andreas W. Götz,Kenneth M. Merz Jr.,Darrin M. York,Hasan Metin Aktulga,Madushanka Manathunga
General force fields such as General Amber Force Field (GAFF) have been designed for broad applicability and are widely used in protein–ligand binding simulations in structure-based drug discovery. However, the force field parameters are not always transferable across ligand molecules, and custom reparameterization is sometimes necessary for accurate binding free energy simulations. This is especially true for torsion parameters, which are highly dependent on stereoelectronic and steric effects. Here, we report a novel, flexible, and user-friendly computational tool called the Automated Force Field Developer and Optimizer (AFFDO) platform that allows generating accurate, tailored GAFF2 torsion parameters for drug-like molecules. For a given ligand, AFFDO selects the most important torsions, carries out GPU-accelerated density functional theory calculations to collect reference data and fits torsion terms using a fast gradient-based optimizer that leverages automated differentiation. We benchmark AFFDO by parametrizing a series of drug-like molecules and carrying out protein–ligand relative binding free energy (RBFE) simulations. The results show that AFFDO can significantly improve GAFF2 torsion parameters against QM reference data, which in some cases translates into better agreement with experimental RBFE values within a reasonable computational time.
通用力场(General Amber force Field, GAFF)具有广泛的适用性,被广泛应用于基于结构的药物发现中的蛋白质-配体结合模拟。然而,力场参数并不总是可以在配体分子之间传递,为了精确地模拟结合自由能,有时需要自定义重新参数化。对于高度依赖于立体电子效应和空间效应的扭转参数尤其如此。在这里,我们报告了一种新颖的、灵活的、用户友好的计算工具,称为自动化力场开发和优化器(AFFDO)平台,它允许为类药物分子生成准确的、定制的GAFF2扭转参数。对于给定的配体,AFFDO选择最重要的扭转,执行gpu加速的密度泛函理论计算以收集参考数据,并使用利用自动微分的快速梯度优化器拟合扭转项。我们通过参数化一系列药物样分子并进行蛋白质-配体相对结合自由能(RBFE)模拟来对AFFDO进行基准测试。结果表明,AFFDO可以显著提高GAFF2对QM参考数据的扭转参数,在某些情况下,在合理的计算时间内转化为与实验RBFE值更好的一致性。
{"title":"Automated Force Field Developer and Optimizer Platform: Torsion Reparameterization","authors":"Alejandro Blanco-Gonzalez,William Betancourt,Ryan Michael Snyder,Shi Zhang,Timothy J. Giese,Zeke A. Piskulich,Andreas W. Götz,Kenneth M. Merz Jr.,Darrin M. York,Hasan Metin Aktulga,Madushanka Manathunga","doi":"10.1021/acs.jcim.6c00528","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00528","url":null,"abstract":"General force fields such as General Amber Force Field (GAFF) have been designed for broad applicability and are widely used in protein–ligand binding simulations in structure-based drug discovery. However, the force field parameters are not always transferable across ligand molecules, and custom reparameterization is sometimes necessary for accurate binding free energy simulations. This is especially true for torsion parameters, which are highly dependent on stereoelectronic and steric effects. Here, we report a novel, flexible, and user-friendly computational tool called the Automated Force Field Developer and Optimizer (AFFDO) platform that allows generating accurate, tailored GAFF2 torsion parameters for drug-like molecules. For a given ligand, AFFDO selects the most important torsions, carries out GPU-accelerated density functional theory calculations to collect reference data and fits torsion terms using a fast gradient-based optimizer that leverages automated differentiation. We benchmark AFFDO by parametrizing a series of drug-like molecules and carrying out protein–ligand relative binding free energy (RBFE) simulations. The results show that AFFDO can significantly improve GAFF2 torsion parameters against QM reference data, which in some cases translates into better agreement with experimental RBFE values within a reasonable computational time.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"45 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147383804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1021/acs.jcim.5c02709
Peiyao Li,Lan Hua,Ye Liu,Jun Zhu
Deep learning has accelerated drug discovery by enabling large-scale virtual screening, but current models often act as "black boxes" and provide no formal guarantees about prediction reliability. This limitation is particularly critical for compound-protein interaction (CPI) prediction, where data sets are highly imbalanced and erroneous predictions can lead to costly failures. Here we introduce ConfBiXtCPI, an integrated framework that unifies accurate prediction, interpretability, and statistically rigorous uncertainty quantification. At its core is a bidirectional cross-attention transformer that captures molecular recognition patterns from sequence-level inputs, achieving state-of-the-art accuracy across multiple benchmarks. To address class imbalance and uncertainty, we incorporate Mondrian conformal prediction, which guarantees valid coverage for both majority and minority classes. Building on this, a conformal selection procedure enables principled control of the false discovery rate, allowing users to specify risk thresholds while maintaining discovery power. Beyond accuracy, ConfBiXtCPI provides mechanistic interpretability through attention maps that localize to biophysically relevant binding sites, and its uncertainty estimates support efficient active learning strategies. Together, these advances establish ConfBiXtCPI as a trustworthy and practical tool for guiding experimental validation and accelerating therapeutic discovery.
{"title":"Trustworthy Compound-Protein Interaction Prediction with Interpretable and Conformalized Cross-Attention Transformers.","authors":"Peiyao Li,Lan Hua,Ye Liu,Jun Zhu","doi":"10.1021/acs.jcim.5c02709","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02709","url":null,"abstract":"Deep learning has accelerated drug discovery by enabling large-scale virtual screening, but current models often act as \"black boxes\" and provide no formal guarantees about prediction reliability. This limitation is particularly critical for compound-protein interaction (CPI) prediction, where data sets are highly imbalanced and erroneous predictions can lead to costly failures. Here we introduce ConfBiXtCPI, an integrated framework that unifies accurate prediction, interpretability, and statistically rigorous uncertainty quantification. At its core is a bidirectional cross-attention transformer that captures molecular recognition patterns from sequence-level inputs, achieving state-of-the-art accuracy across multiple benchmarks. To address class imbalance and uncertainty, we incorporate Mondrian conformal prediction, which guarantees valid coverage for both majority and minority classes. Building on this, a conformal selection procedure enables principled control of the false discovery rate, allowing users to specify risk thresholds while maintaining discovery power. Beyond accuracy, ConfBiXtCPI provides mechanistic interpretability through attention maps that localize to biophysically relevant binding sites, and its uncertainty estimates support efficient active learning strategies. Together, these advances establish ConfBiXtCPI as a trustworthy and practical tool for guiding experimental validation and accelerating therapeutic discovery.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"110 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147368351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-06DOI: 10.1021/acs.jcim.5c02959
Wei Lin,Chi Chung Alan Fung
Accurate prediction of acute dermal toxicity is vital for the safe and effective development of contact drugs. While numerous deep learning models have been created to replace costly and ethically challenging animal toxicity tests, most approaches overlook the multiview information on molecules. To overcome this limitation, we introduce a novel model named MVIToxNet, which integrates multiview features from both molecular fingerprints and SMILES sequences. To capture the multiview information on SMILES, MVIToxNet incorporates character-level and atom-level features. In addition, byte-pair encoding tokenization is utilized to capture substructural details within molecules, allowing the model to differentiate similar SMILES by assigning distinct tokens to different substructures. Since the data sets in this study are small and imbalanced, we argue that selecting a single model based solely on the best validation performance may not reliably reflect the best generalization for test sets. Therefore, we propose a weighted model averaging approach that combines multiple trained models according to their top-K validation scores into one model, yielding an improved model for inference. Extensive experimental results demonstrate that MVIToxNet significantly outperforms existing baselines in acute dermal toxicity prediction, validating the effectiveness of utilizing multiview features and the weighted model averaging strategy. Furthermore, our proposed methods demonstrate the potential for data-driven model design.
{"title":"Integrating Multiview Information for Enhanced Deep Learning-Based Acute Dermal Toxicity Prediction.","authors":"Wei Lin,Chi Chung Alan Fung","doi":"10.1021/acs.jcim.5c02959","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02959","url":null,"abstract":"Accurate prediction of acute dermal toxicity is vital for the safe and effective development of contact drugs. While numerous deep learning models have been created to replace costly and ethically challenging animal toxicity tests, most approaches overlook the multiview information on molecules. To overcome this limitation, we introduce a novel model named MVIToxNet, which integrates multiview features from both molecular fingerprints and SMILES sequences. To capture the multiview information on SMILES, MVIToxNet incorporates character-level and atom-level features. In addition, byte-pair encoding tokenization is utilized to capture substructural details within molecules, allowing the model to differentiate similar SMILES by assigning distinct tokens to different substructures. Since the data sets in this study are small and imbalanced, we argue that selecting a single model based solely on the best validation performance may not reliably reflect the best generalization for test sets. Therefore, we propose a weighted model averaging approach that combines multiple trained models according to their top-K validation scores into one model, yielding an improved model for inference. Extensive experimental results demonstrate that MVIToxNet significantly outperforms existing baselines in acute dermal toxicity prediction, validating the effectiveness of utilizing multiview features and the weighted model averaging strategy. Furthermore, our proposed methods demonstrate the potential for data-driven model design.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"3 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147359308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rapid growth of high-throughput sequencing data, many proteins remain uncharacterized, while experimental validation is costly and time-consuming. Automatic Function Prediction (AFP) is thus urgently needed. Protein functions are complex and multilevel, with inherent interactions among features such as sequence, structure, and evolution. Existing methods relying on single-level representations or simple feature aggregation struggle to capture the hierarchical dependencies and semantic collaborative relationships in the Gene Ontology (GO) label system, limiting prediction accuracy and generalization. To overcome these challenges, we propose a Multi-View Collaboration Feature Fusion (MVCFF) framework, which leverages complementary features from multiple sequence perspectives to enhance protein function prediction. In MVCFF, a sequential feature extraction subnetwork is designed to capture view-specific information, incorporating both local patterns and long-range dependencies within amino acid sequences. Building on this, a multi-view collaboration paradigm is employed, enabling interactive learning of key positional information through integrated multi-view features and facilitating synergistic information fusion. The resulting multi-view representations are then fed into downstream label predictors to perform classification tasks. To further boost predictive accuracy, we introduce an extended version, MVCFF+, which combines the original MVCFF framework with sequence-similarity-based prediction methods via a weighted fusion strategy. Extensive experiments demonstrate that our approach substantially improves prediction performance, outperforming existing methods by a clear margin. The source code is publicly available at https://github.com/AGI-FBHC/MVCFF.
{"title":"Multi-View Collaboration Feature Fusion for Protein Function Prediction.","authors":"Hailong Yang,Zhongyu Wang,Haijun Shi,Qiao Ning,Zhaohong Deng,Shudong Hu,Yanqi Zhong","doi":"10.1021/acs.jcim.5c03057","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03057","url":null,"abstract":"With the rapid growth of high-throughput sequencing data, many proteins remain uncharacterized, while experimental validation is costly and time-consuming. Automatic Function Prediction (AFP) is thus urgently needed. Protein functions are complex and multilevel, with inherent interactions among features such as sequence, structure, and evolution. Existing methods relying on single-level representations or simple feature aggregation struggle to capture the hierarchical dependencies and semantic collaborative relationships in the Gene Ontology (GO) label system, limiting prediction accuracy and generalization. To overcome these challenges, we propose a Multi-View Collaboration Feature Fusion (MVCFF) framework, which leverages complementary features from multiple sequence perspectives to enhance protein function prediction. In MVCFF, a sequential feature extraction subnetwork is designed to capture view-specific information, incorporating both local patterns and long-range dependencies within amino acid sequences. Building on this, a multi-view collaboration paradigm is employed, enabling interactive learning of key positional information through integrated multi-view features and facilitating synergistic information fusion. The resulting multi-view representations are then fed into downstream label predictors to perform classification tasks. To further boost predictive accuracy, we introduce an extended version, MVCFF+, which combines the original MVCFF framework with sequence-similarity-based prediction methods via a weighted fusion strategy. Extensive experiments demonstrate that our approach substantially improves prediction performance, outperforming existing methods by a clear margin. The source code is publicly available at https://github.com/AGI-FBHC/MVCFF.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"67 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147359156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current enzyme thermostability prediction models are predominantly designed for cross-family generalization, with limited focus on hydroxylases, which restricts their accuracy and applicability in hydroxylase-specific thermostability design. In this study, we develop HyS-BST, a dedicated self-trained semisupervised framework for hydroxylase thermostability prediction. Leveraging a limited hydroxylase data set, HyS-BST integrates a self-training strategy with Bayesian dynamic tuning to achieve high-precision prediction of mutant thermostability in terms of ΔΔG. Experimental results demonstrate that after only ten training iterations, HyS-BST attains a coefficient of determination (R2) of 0.96, a Pearson correlation coefficient (PCC) of 0.98, and a root mean squared error (RMSE) as low as 0.06 on the test set. Compared with the optimal cross-family generalization model, HyS-BST improves PCC and RMSE by approximately 70%. Overall, this framework provides a specialized, efficient, and cost-effective solution for hydroxylase thermostability prediction, substantially reducing the candidate search space and experimental resources required for downstream validation.
{"title":"Hydroxylase Thermostability Prediction Based on Self-Trained Semisupervised Iteration and Bayesian Dynamic Tuning.","authors":"Sujuan Liu,Mengyu Yu,Lei Zhang,Dongyan Wen,Xiaotong Yu,Jianmei Luo,Chuanlei Zhang","doi":"10.1021/acs.jcim.6c00102","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00102","url":null,"abstract":"Current enzyme thermostability prediction models are predominantly designed for cross-family generalization, with limited focus on hydroxylases, which restricts their accuracy and applicability in hydroxylase-specific thermostability design. In this study, we develop HyS-BST, a dedicated self-trained semisupervised framework for hydroxylase thermostability prediction. Leveraging a limited hydroxylase data set, HyS-BST integrates a self-training strategy with Bayesian dynamic tuning to achieve high-precision prediction of mutant thermostability in terms of ΔΔG. Experimental results demonstrate that after only ten training iterations, HyS-BST attains a coefficient of determination (R2) of 0.96, a Pearson correlation coefficient (PCC) of 0.98, and a root mean squared error (RMSE) as low as 0.06 on the test set. Compared with the optimal cross-family generalization model, HyS-BST improves PCC and RMSE by approximately 70%. Overall, this framework provides a specialized, efficient, and cost-effective solution for hydroxylase thermostability prediction, substantially reducing the candidate search space and experimental resources required for downstream validation.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"15 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147351174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}