Type III CRISPR systems provide adaptive immunity against invasion of foreign nucleic acids by generating cyclic oligoadenylate (cAn) second messengers, which activate effector proteins containing CRISPR-associated Rossmann fold (CARF) domains. The apo form of CARF adopts a closed state, distinct from its cA4-bound open state conformation. To investigate the conformational transition, we performed multiple type molecular dynamics (MD) simulations, revealing a unidirectional conformational shift toward the closed state. This transition was hindered by reduced flexibility in cA4-binding residues. Notably, the conformational change primarily occurs between the two monomers, with minimal structural rearrangement within individual monomers. Comparative analysis showed that while the number of hydrogen bonds and contacts between CARF and cA4 decreases in the closed state, intermonomer interactions are strengthened. Binding free-energy calculations between the two chains of CARF further confirmed higher affinity in the closed state. Our findings support an energy-driven conformational change model, providing insights for optimizing CRISPR-based genetic manipulation tools.
G Protein-Coupled Receptors (GPCRs) are important targets for drug discovery owing to their ability to respond to a broad range of stimuli and their involvement in numerous pathologies. Although traditional ligand-based and structure-based approaches have facilitated the development of effective therapeutics for many GPCRs, these approaches often fall short when applied to receptors with limited ligand or structural data. This limitation highlights the critical need for advanced strategies capable of accurately predicting ligand bioactivity across the entire GPCR family, especially for understudied receptor subtypes. In this study, we introduce BOLD-GPCRs (BERT-Optimized Ligand Discovery for GPCRs), a deep learning framework designed to enhance the prediction of ligand bioactivity across class A GPCRs. Accessible via a user-friendly web interface, BOLD-GPCRs employs transfer learning and leverages curated data sets of known class A GPCR ligands, receptor sequences, and signaling-relevant mutations. By integrating dense neural network classifiers with transformer-based protein language models, BOLD-GPCRs captures complex relationships between receptor sequence/function and ligand activity. Our results demonstrate that BOLD-GPCRs achieves robust predictive performance for both ligand bioactivity and mutational effects across a broad range of class A GPCRs, underscoring its potential as a valuable tool for ligand discovery, especially for poorly characterized receptors.
Atom surface site Interaction Points (AIP) which were previously used to predict association constants for synthetic host–guest systems has been extended to protein–ligand complexes. AIP descriptions of protein binding sites were obtained by combining a library of precomputed AIP descriptors for all protein functional groups with a graph-based substructure matching algorithm. The corresponding AIP description of ligands was obtained directly by footprinting the molecular electrostatic potential surface calculated using density functional theory. These AIP descriptions were projected onto X-ray crystal structures of protein–ligand complexes to identify pairs of AIPs that were sufficiently close in space to constitute an intermolecular interaction. The overall free energy of binding was calculated by summing the contributions of each AIP contact and associated desolvation. Application to the 94 complexes involving uncharged ligands in CASF benchmark data set showed that the method achieves a Pearson correlation coefficient of 0.76 and an RMSD of 11 kJ mol–1 for absolute free energies of binding.
Drug discovery and medicinal chemistry efforts are increasingly influenced by machine learning (ML), with compound property prediction as a central application. ML models have demonstrated strong performance in predicting various compound properties from chemical structure. However, these models can exhibit varying levels of prediction error, making uncertainty quantification (UQ) essential for informed decisions. Standard UQ metrics include the distance to the molecules in the training set and prediction variance, obtained through methods such as model ensembles or Bayesian modeling. Although several UQ methodologies have been developed in recent years, no single approach consistently outperformed others. Herein, we present a comprehensive benchmark of UQ strategies for ML-based prediction of absorption, distribution, metabolism, and excretion (ADME) properties, using both in-house and public data sets. We employed the recently introduced UNIQUE (UNcertaInty QUantification bEnchmarking) framework and evaluated UQ method performance under data shifts. Our findings indicate data-based UQ metrics (e.g., chemical distance), and model-based UQ metrics (e.g., predicted value and variance) may capture complementary aspects of uncertainty. Their combination through error models, designed to predict the original ML model’s error, yielded higher-quality uncertainty estimates. These error models emerged as a promising strategy for enhancing UQ, showing robustness in under various degrees and types of data shift. Taken together, our work highlights the potential of combining diverse UQ metrics and error modeling to improve reliability in molecular property prediction. By establishing standardized evaluation setups and assessing UQ under data shifts, we provide a foundation for future UQ method development and benchmarking in the field.

