Predicting protein ligand-binding pockets is crucial for understanding various biological processes, drug discovery, and design. Existing methods predominantly convert proteins into 3D voxels and process them using extensive convolutions, which struggle to effectively capture long-range semantic information within proteins. Furthermore, they lack global modeling and adaptive filtering of cross-layer features, limiting the precise characterization of pocket detail features. To tackle these issues, we propose a novel U-shaped network architecture that integrates spatial gating mechanisms and local feature enhancement for accurate protein–ligand binding pocket prediction. Specifically, we improve the traditional U-shaped network encoder by integrating the Mamba module and a Local Feature Enhancement (LFE) module to achieve efficient global modeling and adaptive enhancement of local features. Additionally, we introduce a novel Spatial Enhanced Mamba Gate (SEMG) module at skip connections to filter redundant information and enhance multiscale feature fusion. Experiments across extensive protein–ligand data sets demonstrate that our approach outperforms existing methods in both performance and interpretability.
{"title":"SGLEPocket: A Spatial Gating and Local Feature Enhancement Network for Protein–Ligand Binding Pocket Prediction","authors":"Xiyun Yang,Wei Zhang,Chenxi Luo,Zhaohong Deng,Dongxing Gu,Dong-jun Yu,Shudong Hu,Yanqi Zhong","doi":"10.1021/acs.jcim.5c03009","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03009","url":null,"abstract":"Predicting protein ligand-binding pockets is crucial for understanding various biological processes, drug discovery, and design. Existing methods predominantly convert proteins into 3D voxels and process them using extensive convolutions, which struggle to effectively capture long-range semantic information within proteins. Furthermore, they lack global modeling and adaptive filtering of cross-layer features, limiting the precise characterization of pocket detail features. To tackle these issues, we propose a novel U-shaped network architecture that integrates spatial gating mechanisms and local feature enhancement for accurate protein–ligand binding pocket prediction. Specifically, we improve the traditional U-shaped network encoder by integrating the Mamba module and a Local Feature Enhancement (LFE) module to achieve efficient global modeling and adaptive enhancement of local features. Additionally, we introduce a novel Spatial Enhanced Mamba Gate (SEMG) module at skip connections to filter redundant information and enhance multiscale feature fusion. Experiments across extensive protein–ligand data sets demonstrate that our approach outperforms existing methods in both performance and interpretability.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"44 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147506242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurately predicting the binding affinity between ribonucleic acid (RNA) and small molecules (RSMA) is crucial for RNA-targeted drug discovery, yet existing computational methods face challenges in fully leveraging multisource feature information and modeling complex interactions. To address these challenges, this paper presents DeepMIF, an innovative deep learning framework based on a novel multiview interactive fusion paradigm. Initially, the framework employs a hybrid RNA representation combining a Localized Enhanced Scalable k-mer (L-ESKmer) strategy with pretrained embeddings to capture multiscale sequence patterns, while simultaneously extracting small molecule features from both sequence and graph views, yielding four distinct feature channels. At its core is an advanced multiview interactive fusion module wherein fine-grained interactions among multiple molecular modalities are modeled. Information is subsequently exchanged through a multihead cross-attention network equipped with a fused value vector. This mechanism transforms the attention process from simple information retrieval into an intelligent information synthesis, dynamically building a shared value vector from the context of all modalities. In a rigorous 5-fold cross-validation (CV) on a benchmark data set of 1439 RNA–small molecule pairs, DeepMIF demonstrates state-of-the-art performance, achieving a Pearson correlation coefficient (PCC) of 0.796 and a root-mean-square error (RMSE) of 0.874. More importantly, the model exhibits a strong generalization ability and robustness in challenging cold-start scenarios. The capability of DeepMIF to capture biologically meaningful, critical binding sites is further confirmed through interpretability analysis and case studies, highlighting its potential to guide structure-based RNA-targeted drug design.
{"title":"DeepMIF: A Multiview Interactive Fusion-Based Deep Learning Method for RNA–Small Molecule Binding Affinity Prediction","authors":"Jinmiao Song,Annan Gao,Shengwei Tian,Qimeng Yang,Lei Deng,Qilin Feng,Meitong Hou","doi":"10.1021/acs.jcim.5c02946","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02946","url":null,"abstract":"Accurately predicting the binding affinity between ribonucleic acid (RNA) and small molecules (RSMA) is crucial for RNA-targeted drug discovery, yet existing computational methods face challenges in fully leveraging multisource feature information and modeling complex interactions. To address these challenges, this paper presents DeepMIF, an innovative deep learning framework based on a novel multiview interactive fusion paradigm. Initially, the framework employs a hybrid RNA representation combining a Localized Enhanced Scalable k-mer (L-ESKmer) strategy with pretrained embeddings to capture multiscale sequence patterns, while simultaneously extracting small molecule features from both sequence and graph views, yielding four distinct feature channels. At its core is an advanced multiview interactive fusion module wherein fine-grained interactions among multiple molecular modalities are modeled. Information is subsequently exchanged through a multihead cross-attention network equipped with a fused value vector. This mechanism transforms the attention process from simple information retrieval into an intelligent information synthesis, dynamically building a shared value vector from the context of all modalities. In a rigorous 5-fold cross-validation (CV) on a benchmark data set of 1439 RNA–small molecule pairs, DeepMIF demonstrates state-of-the-art performance, achieving a Pearson correlation coefficient (PCC) of 0.796 and a root-mean-square error (RMSE) of 0.874. More importantly, the model exhibits a strong generalization ability and robustness in challenging cold-start scenarios. The capability of DeepMIF to capture biologically meaningful, critical binding sites is further confirmed through interpretability analysis and case studies, highlighting its potential to guide structure-based RNA-targeted drug design.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147506241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-25DOI: 10.1021/acs.jcim.6c00108
Zhengyue Zhang,Gaia Dolcetti,Christian Tyrchan,Leonardo De Maria,Giovanni Bussi,Werngard Czechtizky
RNAs are increasingly recognized as promising drug targets, as both coding and noncoding RNAs act as key regulators in disease-related biological processes. However, a significant gap persists between the number of known RNA sequences and the solved RNA structures, posing a major bottleneck for RNA-targeted drug discovery. RNA secondary structure prediction offers the potential to facilitate the identification of druggable sites in novel RNA sequences by rapidly predicting base pairing patterns. In this study, we benchmarked widely used RNA secondary structure prediction tools against a newly curated dataset of ligand-bound RNA structures. We found that most tools achieve reasonable accuracy for RNAs with short sequences and simple motifs, but their performance declines for longer RNAs and those containing pseudoknots. Notably, prediction accuracy is reduced within ligand binding sites, where noncanonical base pairs and complex secondary structure elements are prevalent yet consistently unrecognized by the tools. Consequently, RNA ligand binding sites are poorly reconstructed by secondary structure predictions. This work provides the first comprehensive assessment of RNA secondary structure prediction for ligand-bound RNAs and demonstrates the challenges for integrating these methods into RNA-targeted drug discovery pipelines.
{"title":"Exploring Secondary Structure Predictions for RNA-Targeted Drug Discovery: Power and Challenges","authors":"Zhengyue Zhang,Gaia Dolcetti,Christian Tyrchan,Leonardo De Maria,Giovanni Bussi,Werngard Czechtizky","doi":"10.1021/acs.jcim.6c00108","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00108","url":null,"abstract":"RNAs are increasingly recognized as promising drug targets, as both coding and noncoding RNAs act as key regulators in disease-related biological processes. However, a significant gap persists between the number of known RNA sequences and the solved RNA structures, posing a major bottleneck for RNA-targeted drug discovery. RNA secondary structure prediction offers the potential to facilitate the identification of druggable sites in novel RNA sequences by rapidly predicting base pairing patterns. In this study, we benchmarked widely used RNA secondary structure prediction tools against a newly curated dataset of ligand-bound RNA structures. We found that most tools achieve reasonable accuracy for RNAs with short sequences and simple motifs, but their performance declines for longer RNAs and those containing pseudoknots. Notably, prediction accuracy is reduced within ligand binding sites, where noncanonical base pairs and complex secondary structure elements are prevalent yet consistently unrecognized by the tools. Consequently, RNA ligand binding sites are poorly reconstructed by secondary structure predictions. This work provides the first comprehensive assessment of RNA secondary structure prediction for ligand-bound RNAs and demonstrates the challenges for integrating these methods into RNA-targeted drug discovery pipelines.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"111 3S 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147506243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-25DOI: 10.1021/acs.jcim.6c00224
Yue Chen,Junhao Li,Lucie Delemotte,Yuguang Mu
The glucagon-like peptide-1 receptor (GLP-1R) is a key therapeutic target for metabolic disorders, particularly type 2 diabetes and obesity. Although current treatments are effective, their unavoidable side effects continue to drive the search for novel therapeutic strategies. Ago-allosteric modulators (ago-PAMs), which act as agonists on their own while enhancing the affinity and efficacy of orthosteric agonists, represent a promising avenue to overcome limitations associated with traditional peptide-based therapies. However, the molecular mechanisms by which ago-PAMs modulate GLP-1R activation remain poorly understood. In this work, we selected compound 2, a validated ago-PAM of GLP-1R, as a probe to explore these mechanisms at the atomic level. Using molecular dynamics (MD) simulations, we elucidate how compound 2 stabilizes the active conformation of GLP-1R through allosteric binding and reveal distinct pathways by which it enhances the binding of both peptide and non-peptide orthosteric agonists. Enhanced sampling simulations further provided a comprehensive conformational landscape of GLP-1R activation, identifying two intermediate states that bridge inactive and active conformations. Compound 2 was found to bias the receptor toward active-like ensembles, consistent with its intrinsic agonist activity. Together, our findings provide mechanistic insights into ago-allosteric modulation of GLP-1R, offering useful information for the rational design of small-molecule modulators with improved therapeutic profiles.
{"title":"Unveiling the Activation Mechanism of Glucagon-Like Peptide-1 Receptor by an Ago-Allosteric Modulator via Molecular Dynamics Simulations","authors":"Yue Chen,Junhao Li,Lucie Delemotte,Yuguang Mu","doi":"10.1021/acs.jcim.6c00224","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00224","url":null,"abstract":"The glucagon-like peptide-1 receptor (GLP-1R) is a key therapeutic target for metabolic disorders, particularly type 2 diabetes and obesity. Although current treatments are effective, their unavoidable side effects continue to drive the search for novel therapeutic strategies. Ago-allosteric modulators (ago-PAMs), which act as agonists on their own while enhancing the affinity and efficacy of orthosteric agonists, represent a promising avenue to overcome limitations associated with traditional peptide-based therapies. However, the molecular mechanisms by which ago-PAMs modulate GLP-1R activation remain poorly understood. In this work, we selected compound 2, a validated ago-PAM of GLP-1R, as a probe to explore these mechanisms at the atomic level. Using molecular dynamics (MD) simulations, we elucidate how compound 2 stabilizes the active conformation of GLP-1R through allosteric binding and reveal distinct pathways by which it enhances the binding of both peptide and non-peptide orthosteric agonists. Enhanced sampling simulations further provided a comprehensive conformational landscape of GLP-1R activation, identifying two intermediate states that bridge inactive and active conformations. Compound 2 was found to bias the receptor toward active-like ensembles, consistent with its intrinsic agonist activity. Together, our findings provide mechanistic insights into ago-allosteric modulation of GLP-1R, offering useful information for the rational design of small-molecule modulators with improved therapeutic profiles.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"1 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147506244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-24DOI: 10.1021/acs.jcim.5c03216
Zhenxiang Xu,Jiayi Que,Yue Hong,Lantian Yao,Juan Liu,Xiangrong Liu
Accurate prediction of drug-target interactions (DTIs) is foundational to drug development. Over the years, representation learning methods based on sequences and relational knowledge have shown considerable promise in this field. However, DTI prediction remains a challenging task, particularly in cold-start settings and few-shot scenarios involving novel drugs or proteins. Therefore, we propose a novel DTI prediction framework. To enhance the model’s generalization in settings with scarce labels and unseen entities, we introduce a link-based contrastive learning strategy. Instead of aligning entity-level global features, this strategy aligns fine-grained local features derived from both the sequence and relational modalities. Complementing this, we introduce a link-based cross-attention mechanism. This mechanism captures contextual features specific to individual drug–protein pairs conditioned on different links, providing necessary local features for contrastive learning strategies. Our model was evaluated on both cold-start and few-shot datasets involving unseen drugs or proteins, and significantly outperformed state-of-the-art (SOTA) methods. Furthermore, when evaluated in conventional data-rich settings, our model still demonstrates superior performance over current approaches.
{"title":"CrossLinker: Aligning Relational and Sequential Contexts for Drug-Target Interaction Prediction in Cold-Start and Few-Shot Scenarios","authors":"Zhenxiang Xu,Jiayi Que,Yue Hong,Lantian Yao,Juan Liu,Xiangrong Liu","doi":"10.1021/acs.jcim.5c03216","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03216","url":null,"abstract":"Accurate prediction of drug-target interactions (DTIs) is foundational to drug development. Over the years, representation learning methods based on sequences and relational knowledge have shown considerable promise in this field. However, DTI prediction remains a challenging task, particularly in cold-start settings and few-shot scenarios involving novel drugs or proteins. Therefore, we propose a novel DTI prediction framework. To enhance the model’s generalization in settings with scarce labels and unseen entities, we introduce a link-based contrastive learning strategy. Instead of aligning entity-level global features, this strategy aligns fine-grained local features derived from both the sequence and relational modalities. Complementing this, we introduce a link-based cross-attention mechanism. This mechanism captures contextual features specific to individual drug–protein pairs conditioned on different links, providing necessary local features for contrastive learning strategies. Our model was evaluated on both cold-start and few-shot datasets involving unseen drugs or proteins, and significantly outperformed state-of-the-art (SOTA) methods. Furthermore, when evaluated in conventional data-rich settings, our model still demonstrates superior performance over current approaches.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"9 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147506278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate modeling of the volumetric behavior of ionic liquids (ILs) is crucial for guiding the design of electrolytes for energy storage and other chemical systems. While classical group contribution methods (GCMs) are grounded in thermodynamic theory, traditional machine learning (ML) models often lack physically consistent predictions and generalizability. To improve this, a hybrid modeling strategy is introduced that couples a reoptimized Classical-GCM with a physics-informed neural network (PINN-GCM), where thermodynamically optimized parameters from the Tait equation are directly incorporated into the hybrid loss function of the network. Building on the previous efforts of Jacquemin et al. (Ind. Eng. Chem. Res., 2017, 56, 6827-6840), the data set was extracted from the National Institute of Standards and Technology (NIST) database. The PINN-GCM framework was evaluated across 92 ILs, comprising 8,467 experimental data points spanning 217-473 K and 0.1-207 MPa. The aggregate performance yielded average RAAD values of 0.067 and 0.065% for the training and test sets, respectively, at the IL level. The ion-level models were trained on 6,049 points from 59 ILs (32 cations and 28 anions), with extrapolation evaluated on 2,958 points from 21 unseen IL combinations, demonstrating strong combinatorial generalization to new pairings of known ions, although structural generalization to entirely novel ion chemistries remains beyond the scope of the current model. The framework shows promise for integration into process simulation tools and extension to related IL properties (viscosity and conductivity), although its applicability is validated within the experimental temperature-pressure range and requires ions present in the established library. This strategy highlights the potential of merging physics-based modeling and ML to develop foundational models for multiproperty prediction, thereby promoting the improved design of safer electrolytes and other chemical systems in the future.
{"title":"Physics-Guided Machine Learning for Ionic-Liquid Volumetric Properties.","authors":"Kingsley Omeoga,Tausif Altamash,Mouad Dahbi,Johan Jacquemin","doi":"10.1021/acs.jcim.5c02962","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02962","url":null,"abstract":"Accurate modeling of the volumetric behavior of ionic liquids (ILs) is crucial for guiding the design of electrolytes for energy storage and other chemical systems. While classical group contribution methods (GCMs) are grounded in thermodynamic theory, traditional machine learning (ML) models often lack physically consistent predictions and generalizability. To improve this, a hybrid modeling strategy is introduced that couples a reoptimized Classical-GCM with a physics-informed neural network (PINN-GCM), where thermodynamically optimized parameters from the Tait equation are directly incorporated into the hybrid loss function of the network. Building on the previous efforts of Jacquemin et al. (Ind. Eng. Chem. Res., 2017, 56, 6827-6840), the data set was extracted from the National Institute of Standards and Technology (NIST) database. The PINN-GCM framework was evaluated across 92 ILs, comprising 8,467 experimental data points spanning 217-473 K and 0.1-207 MPa. The aggregate performance yielded average RAAD values of 0.067 and 0.065% for the training and test sets, respectively, at the IL level. The ion-level models were trained on 6,049 points from 59 ILs (32 cations and 28 anions), with extrapolation evaluated on 2,958 points from 21 unseen IL combinations, demonstrating strong combinatorial generalization to new pairings of known ions, although structural generalization to entirely novel ion chemistries remains beyond the scope of the current model. The framework shows promise for integration into process simulation tools and extension to related IL properties (viscosity and conductivity), although its applicability is validated within the experimental temperature-pressure range and requires ions present in the established library. This strategy highlights the potential of merging physics-based modeling and ML to develop foundational models for multiproperty prediction, thereby promoting the improved design of safer electrolytes and other chemical systems in the future.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-24DOI: 10.1021/acs.jcim.5c03224
Shuyu Zhong,Yuying Jiang
AlphaFold2 (AF2) has greatly increased the availability of predicted protein structures, but many binding-site prediction methods trained on experimentally determined Protein Data Bank (PDB) complexes perform less well when applied to AF2 models. This loss of accuracy reflects differences between ligand-bound experimental structures and predominantly apo-like predicted models, as well as nonuniform local structural reliability indicated by pLDDT scores. To address this, we introduce ProtCross, a confidence-aware domain adaptation framework for residue-level binding-site prediction on predicted protein structures. Proteins are represented as residue point clouds and encoded using a hierarchical PointNet++ architecture, with ESM-C protein language model embeddings providing physicochemical and evolutionary information. To transfer models trained on labeled PDB structures to unlabeled AF2 models, ProtCross employs adversarial domain adaptation with a gradient reversal layer, while weighting the domain-adversarial loss by pLDDT to reduce the influence of low-confidence regions. On an AF2 test set derived from the PDBbind v2020 Refined Set, ProtCross shows improved performance relative to existing binding-site predictors and an architecture-matched geometric baseline, as measured by area under the ROC curve and segmentation accuracy. Ablation analyses indicate that pLDDT-guided weighting mitigates negative transfer observed with standard domain-adversarial training.
{"title":"ProtCross: Bridging the PDB-AlphaFold Gap for Binding Site Prediction with Protein Point Clouds.","authors":"Shuyu Zhong,Yuying Jiang","doi":"10.1021/acs.jcim.5c03224","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03224","url":null,"abstract":"AlphaFold2 (AF2) has greatly increased the availability of predicted protein structures, but many binding-site prediction methods trained on experimentally determined Protein Data Bank (PDB) complexes perform less well when applied to AF2 models. This loss of accuracy reflects differences between ligand-bound experimental structures and predominantly apo-like predicted models, as well as nonuniform local structural reliability indicated by pLDDT scores. To address this, we introduce ProtCross, a confidence-aware domain adaptation framework for residue-level binding-site prediction on predicted protein structures. Proteins are represented as residue point clouds and encoded using a hierarchical PointNet++ architecture, with ESM-C protein language model embeddings providing physicochemical and evolutionary information. To transfer models trained on labeled PDB structures to unlabeled AF2 models, ProtCross employs adversarial domain adaptation with a gradient reversal layer, while weighting the domain-adversarial loss by pLDDT to reduce the influence of low-confidence regions. On an AF2 test set derived from the PDBbind v2020 Refined Set, ProtCross shows improved performance relative to existing binding-site predictors and an architecture-matched geometric baseline, as measured by area under the ROC curve and segmentation accuracy. Ablation analyses indicate that pLDDT-guided weighting mitigates negative transfer observed with standard domain-adversarial training.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"16 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-24DOI: 10.1021/acs.jcim.5c02868
Michael Kilgour,Mark E. Tuckerman,Jutta Rogal
We present MXtalTools, a flexible Python package for the data-driven modeling of molecular crystals, facilitating machine learning studies of the molecular solid state. MXtalTools comprises several classes of utilities: (1) synthesis, collation, and curation of molecule and crystal data sets, (2) integrated workflows for model training and inference, (3) crystal parametrization and representation, (4) crystal structure sampling and optimization, (5) end-to-end differentiable crystal sampling, construction, and analysis. Our modular functions can be integrated into existing workflows or combined and used to build novel modeling pipelines. MXtalTools leverages CUDA acceleration to enable high-throughput crystal modeling. The Python code is available open-source on our GitHub page, with detailed documentation on ReadTheDocs.
{"title":"MXtalTools: A Toolkit for Machine Learning on Molecular Crystals","authors":"Michael Kilgour,Mark E. Tuckerman,Jutta Rogal","doi":"10.1021/acs.jcim.5c02868","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02868","url":null,"abstract":"We present MXtalTools, a flexible Python package for the data-driven modeling of molecular crystals, facilitating machine learning studies of the molecular solid state. MXtalTools comprises several classes of utilities: (1) synthesis, collation, and curation of molecule and crystal data sets, (2) integrated workflows for model training and inference, (3) crystal parametrization and representation, (4) crystal structure sampling and optimization, (5) end-to-end differentiable crystal sampling, construction, and analysis. Our modular functions can be integrated into existing workflows or combined and used to build novel modeling pipelines. MXtalTools leverages CUDA acceleration to enable high-throughput crystal modeling. The Python code is available open-source on our GitHub page, with detailed documentation on ReadTheDocs.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"16 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147506280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-23DOI: 10.1021/acs.jcim.6c00569
Christian Fellinger,Marion Sappl,András Szabadi,Benjamin Merget,Klaus-Juergen Schleifer,Thierry Langer
AutoPocket2CREST is an automated workflow for preparing protein-ligand binding pockets for CREST conformational sampling. Starting from protein and ligand structures, the method identifies the ligand, constructs a chemically consistent pocket around it, applies optional backbone constraints, and postprocesses CREST conformers to restore structural annotations. AutoPocket2CREST integrates common open-source tools and enables reproducible semiempirical conformational sampling of protein-bound ligands.
{"title":"AutoPocket2CREST: Automating Binding Pocket Extraction for the CREST Conformer Generation Pipeline.","authors":"Christian Fellinger,Marion Sappl,András Szabadi,Benjamin Merget,Klaus-Juergen Schleifer,Thierry Langer","doi":"10.1021/acs.jcim.6c00569","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00569","url":null,"abstract":"AutoPocket2CREST is an automated workflow for preparing protein-ligand binding pockets for CREST conformational sampling. Starting from protein and ligand structures, the method identifies the ligand, constructs a chemically consistent pocket around it, applies optional backbone constraints, and postprocesses CREST conformers to restore structural annotations. AutoPocket2CREST integrates common open-source tools and enables reproducible semiempirical conformational sampling of protein-bound ligands.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"37 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147495086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-23DOI: 10.1021/acs.jcim.6c00082
Juan Manuel Prieto,Jose A D Cuellar Estrada,Camila Mara Clemente,Marcelo A Martí
In this work, we present practical recommendations for the setup, analysis, and integration of mixed-solvent molecular dynamics (MixMD), solvent-biased docking (SSBD) workflows and pharmacophore analysis, drawing on more than a decade of accumulated experience in the field from multiple implementations and applications. Rather than providing a comprehensive review of all applications of MixMD, this Perspective focuses specifically on its use as a methodological foundation for deriving solvent sites that inform docking and pharmacophore-based strategies in structure-based drug design. Currently, mixed-solvent simulations and solvent-biased docking constitute a coherent, experimentally validated strategy for identifying and exploiting binding hot spots in proteins, and for translating solvent occupancy patterns into structurally interpretable pharmacophoric features and docking constraints. By standardizing best practices, and synthesizing previously published computational studies into a unified methodological framework, we aim to facilitate broader adoption of these methods within the structure-based drug design community, enabling more reliable identification of functional sites and accelerating rational ligand discovery.
{"title":"Best Practices in Mixed-Solvent Molecular Dynamics and Solvent-Site-Biased Docking.","authors":"Juan Manuel Prieto,Jose A D Cuellar Estrada,Camila Mara Clemente,Marcelo A Martí","doi":"10.1021/acs.jcim.6c00082","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00082","url":null,"abstract":"In this work, we present practical recommendations for the setup, analysis, and integration of mixed-solvent molecular dynamics (MixMD), solvent-biased docking (SSBD) workflows and pharmacophore analysis, drawing on more than a decade of accumulated experience in the field from multiple implementations and applications. Rather than providing a comprehensive review of all applications of MixMD, this Perspective focuses specifically on its use as a methodological foundation for deriving solvent sites that inform docking and pharmacophore-based strategies in structure-based drug design. Currently, mixed-solvent simulations and solvent-biased docking constitute a coherent, experimentally validated strategy for identifying and exploiting binding hot spots in proteins, and for translating solvent occupancy patterns into structurally interpretable pharmacophoric features and docking constraints. By standardizing best practices, and synthesizing previously published computational studies into a unified methodological framework, we aim to facilitate broader adoption of these methods within the structure-based drug design community, enabling more reliable identification of functional sites and accelerating rational ligand discovery.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"37 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147495232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}