Pub Date : 2026-03-23DOI: 10.1021/acs.jcim.5c02858
Vincent Fan, Regina Barzilay
The performance of machine-learning models in drug discovery is highly dependent on the quality and consistency of the training data. Due to limitations in data set sizes, many models are trained by aggregating bioactivity data from diverse sources, including public databases such as ChEMBL. However, this approach often introduces significant noise due to variability in experimental protocols. We introduce AssayMatch, a framework for data selection that builds smaller, more homogeneous training sets attuned to the test set of interest. AssayMatch leverages data attribution methods to quantify the contribution of each training assay to the model's performance. These attribution scores are used to fine-tune language embeddings of text-based assay descriptions to capture not just semantic similarity but also the compatibility between assays. Unlike existing data attribution methods, our approach enables data selection for a test set with unknown labels, mirroring real-world drug discovery campaigns in which the activities of candidate molecules are not known in advance. At test time, embeddings fine-tuned with AssayMatch are used to rank all available training data. We demonstrate that models trained on data selected by AssayMatch are able to surpass the performance of the model trained on the complete data set, highlighting its ability to effectively filter out harmful or noisy experiments. We perform experiments on two common machine-learning architectures and see increased prediction capability over a strong language-only baseline for 8/12 model-target pairs. AssayMatch provides a data-driven mechanism to curate higher-quality data sets, reducing noise from incompatible experiments and improving the predictive power and data efficiency of models for drug discovery. AssayMatch is available at https://github.com/Ozymandias314/AssayMatch.
{"title":"AssayMatch: Learning To Select Data for Molecular Activity Models.","authors":"Vincent Fan, Regina Barzilay","doi":"10.1021/acs.jcim.5c02858","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02858","url":null,"abstract":"<p><p>The performance of machine-learning models in drug discovery is highly dependent on the quality and consistency of the training data. Due to limitations in data set sizes, many models are trained by aggregating bioactivity data from diverse sources, including public databases such as ChEMBL. However, this approach often introduces significant noise due to variability in experimental protocols. We introduce AssayMatch, a framework for data selection that builds smaller, more homogeneous training sets attuned to the test set of interest. AssayMatch leverages data attribution methods to quantify the contribution of each training assay to the model's performance. These attribution scores are used to fine-tune language embeddings of text-based assay descriptions to capture not just semantic similarity but also the compatibility between assays. Unlike existing data attribution methods, our approach enables data selection for a test set with unknown labels, mirroring real-world drug discovery campaigns in which the activities of candidate molecules are not known in advance. At test time, embeddings fine-tuned with AssayMatch are used to rank all available training data. We demonstrate that models trained on data selected by AssayMatch are able to surpass the performance of the model trained on the complete data set, highlighting its ability to effectively filter out harmful or noisy experiments. We perform experiments on two common machine-learning architectures and see increased prediction capability over a strong language-only baseline for 8/12 model-target pairs. AssayMatch provides a data-driven mechanism to curate higher-quality data sets, reducing noise from incompatible experiments and improving the predictive power and data efficiency of models for drug discovery. AssayMatch is available at https://github.com/Ozymandias314/AssayMatch.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147496940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
TADF emitter performance depends on both thermodynamic and kinetic factors. We analyzed 747 experimentally known TADF molecules computationally to extract quantitative design guidelines. Using a validated xTB-based workflow, we examine how architecture, geometry, and electronic structure affect the photophysical properties. Among architectures, D-A-D frameworks achieve the smallest ΔEST. A favorable torsional angle of 50°-90° balances small ΔEST with the spin-orbit coupling needed for reverse intersystem crossing. Clustering separates high-performance candidates and highlights multiresonance emitters for blue emission. From these results, we identify 127 candidates with predicted ΔEST < 0.1 eV and oscillator strength f > 0.1. These HTVS-derived design guidelines and candidates can guide future TADF emitter development.
{"title":"Data-Driven Design Guidelines for TADF Emitters from a High-Throughput Screening of 747 Molecules.","authors":"Jean-Pierre Tchapet Njafa, Elvira Vanelle Kameni Tcheuffa, Aissatou Foumkpou Maghame, Serge Guy Nana Engo","doi":"10.1021/acs.jcim.5c03068","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03068","url":null,"abstract":"<p><p>TADF emitter performance depends on both thermodynamic and kinetic factors. We analyzed 747 experimentally known TADF molecules computationally to extract quantitative design guidelines. Using a validated xTB-based workflow, we examine how architecture, geometry, and electronic structure affect the photophysical properties. Among architectures, D-A-D frameworks achieve the smallest Δ<i>E</i><sub>ST</sub>. A favorable torsional angle of 50°-90° balances small Δ<i>E</i><sub>ST</sub> with the spin-orbit coupling needed for reverse intersystem crossing. Clustering separates high-performance candidates and highlights multiresonance emitters for blue emission. From these results, we identify 127 candidates with predicted Δ<i>E</i><sub>ST</sub> < 0.1 eV and oscillator strength <i>f</i> > 0.1. These HTVS-derived design guidelines and candidates can guide future TADF emitter development.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.3,"publicationDate":"2026-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147502826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-22DOI: 10.1021/acs.jcim.5c03075
Muhammad Luthfi,Adam J. Simpkin,Luc G. Elliott,Pornthep Sompornpisut,Daniel J. Rigden
Stereochemistry violations in AlphaFold 3 models are more prevalent than currently appreciated. Analysis of 900 carbohydrate ligands revealed that 85.8% have errors, mainly in chirality but also including bond conversions (15.2%), planar ring distortions (3.9%), aromatic ring formations (2.5%), and improper structural configurations (0.9%). Boltz-1x reduced most violations dramatically but increased improper configurations to 22.1%, notably in N-acetyl-α-neuraminic acid. The BondedAtomPairs protocol reduced stereochemical issues but lost the reducing-end anomeric oxygen, highlighting ongoing challenges in accurate carbohydrate modeling.
{"title":"Physical Implausibility of Carbohydrate Ligands in Results of Deep Learning-Based Cofolding Methods","authors":"Muhammad Luthfi,Adam J. Simpkin,Luc G. Elliott,Pornthep Sompornpisut,Daniel J. Rigden","doi":"10.1021/acs.jcim.5c03075","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03075","url":null,"abstract":"Stereochemistry violations in AlphaFold 3 models are more prevalent than currently appreciated. Analysis of 900 carbohydrate ligands revealed that 85.8% have errors, mainly in chirality but also including bond conversions (15.2%), planar ring distortions (3.9%), aromatic ring formations (2.5%), and improper structural configurations (0.9%). Boltz-1x reduced most violations dramatically but increased improper configurations to 22.1%, notably in N-acetyl-α-neuraminic acid. The BondedAtomPairs protocol reduced stereochemical issues but lost the reducing-end anomeric oxygen, highlighting ongoing challenges in accurate carbohydrate modeling.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"17 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147493152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-21DOI: 10.1021/acs.jcim.5c02308
Pedro M. Hernández,Carlos A. Arango,Soo-Kyung Kim,Andrés Jaramillo-Botero,William A. Goddard III
We propose an atomistic mechanism by which key plant processes, including seed dormancy, root elongation, secondary root proliferation, and flower and fruit produc-tion, are regulated. This regulation occurs through binding of the phytohormone abscisic acid (ABA) to the plant G protein-coupled receptor (GPCR) GCR1. This mirrors the central role of GPCRs in animal systems, where they mediate vision, taste, olfaction, pain perception, and neurotransmission. Establishing GCR1 as a bona fide GPCR in plants would represent a transformative advance in plant biology and agriculture. In particular, GCR1 would be shown to transduce ABA signals through interaction with the Gα subunit (GPA1). However, direct experimental evidence for this interaction and conformation that ABA binding to GCR1 modulates GPA1 inactivation, remains elusive. A major obstacle in testing these hypotheses is the lack of structural data on GPA1 interactions within the ABA-GCR1 complex. To address this gap, we employ molecular dynamics (MD) and metadynamics simulations based on the AMBER and CHARM31 force fields to characterize atomistically the ABA-GCR1-GPA1 ternary complex. Our MD simulations reveal an allosteric mechanism whereby GCR1-ABA binding induces a rigid-body closure of the GPA1 Ras and α–helical domains, creating a steric blockade that traps GDP in the nucleotide-binding pocket. This con-formation prevents GTP exchange and maintains GPA1 in an inactive state, effectively terminating the signaling cascade. Free energy landscape analysis further demonstrates that this closed state represents a deep energy minimum, suggesting biological relevance as a regulatory mechanism. We propose specific mutations in the ABA-binding site of GCR1 and at the GCR1-GPA1 interface that could experimentally validate (or refute) our proposed mechanism. Confirmation of this model would pave the way for designing novel agonists and inverse agonists to precisely manipulate critical plant processes.
{"title":"The atomistic Mechanism Underlying Regulation of the GPA1 G Protein Signaling Pathway Mediated by Abscisic Acid (ABA) Phytohormone Binding to the GCR1 Plant G Protein Coupled Receptor","authors":"Pedro M. Hernández,Carlos A. Arango,Soo-Kyung Kim,Andrés Jaramillo-Botero,William A. Goddard III","doi":"10.1021/acs.jcim.5c02308","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02308","url":null,"abstract":"We propose an atomistic mechanism by which key plant processes, including seed dormancy, root elongation, secondary root proliferation, and flower and fruit produc-tion, are regulated. This regulation occurs through binding of the phytohormone abscisic acid (ABA) to the plant G protein-coupled receptor (GPCR) GCR1. This mirrors the central role of GPCRs in animal systems, where they mediate vision, taste, olfaction, pain perception, and neurotransmission. Establishing GCR1 as a bona fide GPCR in plants would represent a transformative advance in plant biology and agriculture. In particular, GCR1 would be shown to transduce ABA signals through interaction with the Gα subunit (GPA1). However, direct experimental evidence for this interaction and conformation that ABA binding to GCR1 modulates GPA1 inactivation, remains elusive. A major obstacle in testing these hypotheses is the lack of structural data on GPA1 interactions within the ABA-GCR1 complex. To address this gap, we employ molecular dynamics (MD) and metadynamics simulations based on the AMBER and CHARM31 force fields to characterize atomistically the ABA-GCR1-GPA1 ternary complex. Our MD simulations reveal an allosteric mechanism whereby GCR1-ABA binding induces a rigid-body closure of the GPA1 Ras and α–helical domains, creating a steric blockade that traps GDP in the nucleotide-binding pocket. This con-formation prevents GTP exchange and maintains GPA1 in an inactive state, effectively terminating the signaling cascade. Free energy landscape analysis further demonstrates that this closed state represents a deep energy minimum, suggesting biological relevance as a regulatory mechanism. We propose specific mutations in the ABA-binding site of GCR1 and at the GCR1-GPA1 interface that could experimentally validate (or refute) our proposed mechanism. Confirmation of this model would pave the way for designing novel agonists and inverse agonists to precisely manipulate critical plant processes.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"13 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147493150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-20DOI: 10.1021/acs.jcim.5c03108
O Pavela,A Wacha,T Beke-Somfai,A K Sieradzan
Coarse-grained simulations of foldamers such as β-peptides require force fields that accurately capture the backbone geometry and flexibility. In this work, we extend the UNRES coarse-grained model to β3-peptides by reparameterizing key local potential terms: virtual-bond stretching, virtual bond-angle bending, and torsional potentials. The bond-stretching term was derived from probability distributions obtained via all-atom molecular dynamics simulations of a reference β-peptide, while the angular and torsional potentials were fitted to quantum chemical potential energy surfaces computed by using the GFN2-xTB method with implicit solvent. Analytical potential forms were used to model the energy landscapes, and coefficients were obtained via nonlinear fitting to the potential of mean forces (PMFs). The modified UNRES model was validated through coarse-grained simulations and compared to the all-atom reference in terms of structural properties such as radius of gyration, end-to-end distances, and intramolecular side-chain separations. The capacity of the extended force field to reproduce β-peptide helical conformations was also evaluated with a peptide. Furthermore, the ability of the model to reproduce peptide self-assembly was evaluated using two peptides, one that is known to form large aggregates in aqueous solution and another that does not. The simulations successfully recapitulated these experimentally observed behaviors. Overall, the results demonstrate that the newly derived local potentials for β-amino acids can capture overall peptide behavior, making the model suitable for predictive simulations of β-peptide folding and aggregation.
{"title":"Parametrization of β3-Peptides for Coarse-Grained Molecular Dynamics Simulations.","authors":"O Pavela,A Wacha,T Beke-Somfai,A K Sieradzan","doi":"10.1021/acs.jcim.5c03108","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c03108","url":null,"abstract":"Coarse-grained simulations of foldamers such as β-peptides require force fields that accurately capture the backbone geometry and flexibility. In this work, we extend the UNRES coarse-grained model to β3-peptides by reparameterizing key local potential terms: virtual-bond stretching, virtual bond-angle bending, and torsional potentials. The bond-stretching term was derived from probability distributions obtained via all-atom molecular dynamics simulations of a reference β-peptide, while the angular and torsional potentials were fitted to quantum chemical potential energy surfaces computed by using the GFN2-xTB method with implicit solvent. Analytical potential forms were used to model the energy landscapes, and coefficients were obtained via nonlinear fitting to the potential of mean forces (PMFs). The modified UNRES model was validated through coarse-grained simulations and compared to the all-atom reference in terms of structural properties such as radius of gyration, end-to-end distances, and intramolecular side-chain separations. The capacity of the extended force field to reproduce β-peptide helical conformations was also evaluated with a peptide. Furthermore, the ability of the model to reproduce peptide self-assembly was evaluated using two peptides, one that is known to form large aggregates in aqueous solution and another that does not. The simulations successfully recapitulated these experimentally observed behaviors. Overall, the results demonstrate that the newly derived local potentials for β-amino acids can capture overall peptide behavior, making the model suitable for predictive simulations of β-peptide folding and aggregation.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"12 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate and reliable prediction of antibody-antigen binding interactions informed by affinity measurements remains an important challenge in chemical information modeling, with growing concern over the reliability and calibration of confidence estimates in data-driven predictions. Here, we present Trans-GP, a sequence-driven framework that integrates frozen protein language model embeddings with a Gaussian process classifier to jointly perform affinity-informed binary binding classification and quantitative uncertainty calibration. Across multiple benchmark data sets, including SAbDab, SKEMPI2.0, and ABbind, Trans-GP achieves competitive predictive performance while consistently improving calibration quality relative to conventional neural network models. By providing statistically well-calibrated probabilistic confidence estimates, Trans-GP supports reliable screening and prioritization of antibody candidates in chemical information workflows.
{"title":"Trans-GP: Uncertainty-Calibrated Antibody-Antigen Binding Classification Using Protein Language Models.","authors":"Lilan Lv,Xueli Meng,Jinxiong Zhang,Yan Chen,Chunyan Tang,Songjian Wei","doi":"10.1021/acs.jcim.6c00127","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00127","url":null,"abstract":"Accurate and reliable prediction of antibody-antigen binding interactions informed by affinity measurements remains an important challenge in chemical information modeling, with growing concern over the reliability and calibration of confidence estimates in data-driven predictions. Here, we present Trans-GP, a sequence-driven framework that integrates frozen protein language model embeddings with a Gaussian process classifier to jointly perform affinity-informed binary binding classification and quantitative uncertainty calibration. Across multiple benchmark data sets, including SAbDab, SKEMPI2.0, and ABbind, Trans-GP achieves competitive predictive performance while consistently improving calibration quality relative to conventional neural network models. By providing statistically well-calibrated probabilistic confidence estimates, Trans-GP supports reliable screening and prioritization of antibody candidates in chemical information workflows.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"9 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-20DOI: 10.1021/acs.jcim.6c00101
Peixuan Li,Weifu Wang,Dong-Jun Yu
Precise determination of protein functions is essential for elucidating cellular processes and pathological mechanisms, thereby facilitating targeted drug design. Although wet-lab experimental methods remain the gold standard to determine protein functions, their long turnaround times, high costs, and labor-intensive procedures make them impractical for large-scale annotation. Here, we introduced RCHGO, a novel deep-learning framework designed to infer Gene Ontology (GO) annotations directly from protein sequences through leveraging residual graph convolutional networks (RGCNs) equipped with cross-attention mechanisms. Comprehensive benchmarking on 1,493 nonredundant proteins demonstrates that RCHGO achieves superior performance compared with 16 state-of-the-art methods. Detailed analyses indicate that the superior performance of RCHGO arises from its two deep learning modules, which separately exploit complementary manually crafted and protein language model-based feature representations and are effectively fused at the decision level. Meanwhile, the integration of RGCNs and cross-attention modules enables the model to learn rich protein- and residue-level representations and align them effectively with GO semantics. The source code of RCHGO is publicly accessible at https://github.com/peixuanli123/RCHGO.
{"title":"Leveraging Residual Graph Convolutional Networks with Cross-Attention Mechanisms for High-Accuracy Protein Function Prediction.","authors":"Peixuan Li,Weifu Wang,Dong-Jun Yu","doi":"10.1021/acs.jcim.6c00101","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00101","url":null,"abstract":"Precise determination of protein functions is essential for elucidating cellular processes and pathological mechanisms, thereby facilitating targeted drug design. Although wet-lab experimental methods remain the gold standard to determine protein functions, their long turnaround times, high costs, and labor-intensive procedures make them impractical for large-scale annotation. Here, we introduced RCHGO, a novel deep-learning framework designed to infer Gene Ontology (GO) annotations directly from protein sequences through leveraging residual graph convolutional networks (RGCNs) equipped with cross-attention mechanisms. Comprehensive benchmarking on 1,493 nonredundant proteins demonstrates that RCHGO achieves superior performance compared with 16 state-of-the-art methods. Detailed analyses indicate that the superior performance of RCHGO arises from its two deep learning modules, which separately exploit complementary manually crafted and protein language model-based feature representations and are effectively fused at the decision level. Meanwhile, the integration of RGCNs and cross-attention modules enables the model to learn rich protein- and residue-level representations and align them effectively with GO semantics. The source code of RCHGO is publicly accessible at https://github.com/peixuanli123/RCHGO.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"18 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147490181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-20DOI: 10.1021/acs.jcim.5c02755
Stefan C Pate,Eric H Wang,Linda J Broadbelt,Keith E J Tyo
Uncharacterized functions of enzymes represent an untapped opportunity to develop therapeutics, unlock the sustainable synthesis of materials, and understand the evolution of life-sustaining metabolic networks. Uncharacterized enzymes and reactions, generated by protein language models and computer-aided synthesis tools, respectively, make up a large part of this opportunity. Given the technical complexity of high-throughput enzymatic activity screens, predictive models are needed that can prescreen enzyme-reaction pairs in silico. We present (1) a high-quality data set of enzyme-reaction pairs, (2) a rigorous battery of model evaluations varying in their approaches to data splitting and negative sampling, (3) a comprehensive benchmarking of enzyme-reaction models, and (4) a pair of parameter-efficient, data-efficient, high-performing models called Reaction-Center Graph Neural Networks (RC-GNNs) capable of predicting whether an enzyme, represented by an amino acid sequence, can significantly catalyze a given reaction, represented by its full set of reactants and products. In the most difficult conditions, where the query reactions were highly dissimilar from those present in the training data set, our models achieved 0.88 and 0.84 ROC-AUC on classification tasks featuring globally selected and synthetic negatives, respectively. On a time-based split, an RC-GNN achieved 0.91 ROC-AUC. The ability to successfully make predictions on enzymes and reactions distinct from those used during training makes the RC-GNNs especially useful for both metabolic engineers and evolutionary biologists who need to reason about uncharacterized enzymatic reactions.
{"title":"Development of Reaction-Centered Encoders and Benchmarking of Enzyme-Reaction Pair Models.","authors":"Stefan C Pate,Eric H Wang,Linda J Broadbelt,Keith E J Tyo","doi":"10.1021/acs.jcim.5c02755","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c02755","url":null,"abstract":"Uncharacterized functions of enzymes represent an untapped opportunity to develop therapeutics, unlock the sustainable synthesis of materials, and understand the evolution of life-sustaining metabolic networks. Uncharacterized enzymes and reactions, generated by protein language models and computer-aided synthesis tools, respectively, make up a large part of this opportunity. Given the technical complexity of high-throughput enzymatic activity screens, predictive models are needed that can prescreen enzyme-reaction pairs in silico. We present (1) a high-quality data set of enzyme-reaction pairs, (2) a rigorous battery of model evaluations varying in their approaches to data splitting and negative sampling, (3) a comprehensive benchmarking of enzyme-reaction models, and (4) a pair of parameter-efficient, data-efficient, high-performing models called Reaction-Center Graph Neural Networks (RC-GNNs) capable of predicting whether an enzyme, represented by an amino acid sequence, can significantly catalyze a given reaction, represented by its full set of reactants and products. In the most difficult conditions, where the query reactions were highly dissimilar from those present in the training data set, our models achieved 0.88 and 0.84 ROC-AUC on classification tasks featuring globally selected and synthetic negatives, respectively. On a time-based split, an RC-GNN achieved 0.91 ROC-AUC. The ability to successfully make predictions on enzymes and reactions distinct from those used during training makes the RC-GNNs especially useful for both metabolic engineers and evolutionary biologists who need to reason about uncharacterized enzymatic reactions.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"167 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147490179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1021/acs.jcim.6c00365
Ayhan Aydın,Ümit Kaya Eryılmaz,Onur Bahattin Alkan,Pınar Kocagöz,Fatih Ekinci,Mehmet Serdar Güzel
Accurate prediction of the electronic band gap is essential for accelerating the discovery and design of semiconducting and energy materials. Conventional density functional theory (DFT) methods, while physically rigorous, remain computationally expensive and limited in scalability. In this study, we propose a hybrid artificial intelligence framework that combines graph-based deep learning embeddings with classical machine learning algorithms to achieve high-accuracy, interpretable, and computationally efficient band gap prediction. The model integrates embeddings obtained from CGCNN, MEGNet, and SchNet architectures with physically meaningful crystal descriptors─including maximum electronegativity, crystal system, space group, and spin-orbit coupling─and trains them using optimized gradient-boosting and neural architectures. Trained on 136,000 crystal structures from the Materials Project database, the hybrid model achieves R2 = 0.921, MAE = 0.191, and MSE = 0.155, outperforming both classical models (Ward et al., 2016) and standalone graph neural networks such as CGCNN (Xie and Grossman, 2018). The achieved accuracy is statistically comparable to the state-of-the-art ALIGNN model (Choudhary et al., 2021), while requiring lower computational resources and offering enhanced generalization due to the integration of multisource structural information. SHAP-based interpretability analysis highlights that the model captures physically consistent relationships, with metallicity and magnetic site features emerging as dominant factors in band gap prediction. These findings demonstrate that the synergy between deep structural embeddings and classical algorithms provides a powerful, scalable approach for materials informatics. The proposed framework establishes a foundation for multiproperty prediction, transfer learning across databases, and inverse materials design driven by interpretable artificial intelligence.
准确预测电子带隙对于加速半导体和能源材料的发现和设计至关重要。传统的密度泛函理论(DFT)方法虽然物理上严格,但计算成本高,可扩展性有限。在本研究中,我们提出了一种混合人工智能框架,将基于图的深度学习嵌入与经典机器学习算法相结合,以实现高精度、可解释和计算高效的带隙预测。该模型将从CGCNN、MEGNet和SchNet体系结构获得的嵌入与物理上有意义的晶体描述符(包括最大电负性、晶体系统、空间群和自旋轨道耦合)集成在一起,并使用优化的梯度增强和神经结构对它们进行训练。通过对Materials Project数据库中的136,000个晶体结构进行训练,混合模型达到R2 = 0.921, MAE = 0.191, MSE = 0.155,优于经典模型(Ward et al., 2016)和CGCNN等独立图神经网络(Xie and Grossman, 2018)。所获得的精度在统计上可与最先进的ALIGNN模型相媲美(Choudhary等人,2021),同时由于集成了多源结构信息,所需的计算资源更少,并提供了增强的泛化。基于shap的可解释性分析强调,该模型捕获了物理上一致的关系,金属丰度和磁位特征成为带隙预测的主要因素。这些发现表明,深层结构嵌入和经典算法之间的协同作用为材料信息学提供了一种强大的、可扩展的方法。该框架为可解释人工智能驱动的多属性预测、跨数据库迁移学习和逆向材料设计奠定了基础。
{"title":"Hybrid Graph-Machine Learning Framework for Accurate and Interpretable Band Gap Prediction.","authors":"Ayhan Aydın,Ümit Kaya Eryılmaz,Onur Bahattin Alkan,Pınar Kocagöz,Fatih Ekinci,Mehmet Serdar Güzel","doi":"10.1021/acs.jcim.6c00365","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00365","url":null,"abstract":"Accurate prediction of the electronic band gap is essential for accelerating the discovery and design of semiconducting and energy materials. Conventional density functional theory (DFT) methods, while physically rigorous, remain computationally expensive and limited in scalability. In this study, we propose a hybrid artificial intelligence framework that combines graph-based deep learning embeddings with classical machine learning algorithms to achieve high-accuracy, interpretable, and computationally efficient band gap prediction. The model integrates embeddings obtained from CGCNN, MEGNet, and SchNet architectures with physically meaningful crystal descriptors─including maximum electronegativity, crystal system, space group, and spin-orbit coupling─and trains them using optimized gradient-boosting and neural architectures. Trained on 136,000 crystal structures from the Materials Project database, the hybrid model achieves R2 = 0.921, MAE = 0.191, and MSE = 0.155, outperforming both classical models (Ward et al., 2016) and standalone graph neural networks such as CGCNN (Xie and Grossman, 2018). The achieved accuracy is statistically comparable to the state-of-the-art ALIGNN model (Choudhary et al., 2021), while requiring lower computational resources and offering enhanced generalization due to the integration of multisource structural information. SHAP-based interpretability analysis highlights that the model captures physically consistent relationships, with metallicity and magnetic site features emerging as dominant factors in band gap prediction. These findings demonstrate that the synergy between deep structural embeddings and classical algorithms provides a powerful, scalable approach for materials informatics. The proposed framework establishes a foundation for multiproperty prediction, transfer learning across databases, and inverse materials design driven by interpretable artificial intelligence.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"6 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-19DOI: 10.1021/acs.jcim.6c00063
M Soledad Labanda,Sofia Noli Truant,Marisa M Fernández,Enrique Rosenbaum,Andrés Venturino,Luciana Capece
Acetylcholinesterase (AChE) is a cholinergic enzyme that hydrolyzes acetylcholine to terminate neurotransmission. Inhibition of AChE prevents the breakdown of acetylcholine, leading to its accumulation and thereby providing therapeutic relief for memory deficits in Alzheimer's disease. While the inhibitory effects of synthetic ligands on AChE have been widely studied, the modulation of its activity by endogenous polyamines such as spermine and putrescine remains poorly understood at the molecular level. Previous kinetic studies have shown that polyamines can modulate AChE activity, exhibiting an inhibition effect at substrate concentrations less than ∼200 μM. In this work, we characterized the binding modes of polyamines to AChE using molecular dynamics simulations and binding free energy calculations, and measured the dissociation constants by surface plasmon resonance. Our results show that spermine and putrescine bind to the active-site gorge of AChE by interacting with residues of the peripheral anionic site, the catalytic site, and other important residues within the gorge. As a consequence, they block the pathway of the substrate toward the active site. This theoretical approach helps to understand the mechanism responsible for the inhibitory effects of polyamines on AChE activity observed experimentally.
{"title":"Polyamine Binding to Acetylcholinesterase Revealed by Molecular Dynamics and Surface Plasmon Resonance.","authors":"M Soledad Labanda,Sofia Noli Truant,Marisa M Fernández,Enrique Rosenbaum,Andrés Venturino,Luciana Capece","doi":"10.1021/acs.jcim.6c00063","DOIUrl":"https://doi.org/10.1021/acs.jcim.6c00063","url":null,"abstract":"Acetylcholinesterase (AChE) is a cholinergic enzyme that hydrolyzes acetylcholine to terminate neurotransmission. Inhibition of AChE prevents the breakdown of acetylcholine, leading to its accumulation and thereby providing therapeutic relief for memory deficits in Alzheimer's disease. While the inhibitory effects of synthetic ligands on AChE have been widely studied, the modulation of its activity by endogenous polyamines such as spermine and putrescine remains poorly understood at the molecular level. Previous kinetic studies have shown that polyamines can modulate AChE activity, exhibiting an inhibition effect at substrate concentrations less than ∼200 μM. In this work, we characterized the binding modes of polyamines to AChE using molecular dynamics simulations and binding free energy calculations, and measured the dissociation constants by surface plasmon resonance. Our results show that spermine and putrescine bind to the active-site gorge of AChE by interacting with residues of the peripheral anionic site, the catalytic site, and other important residues within the gorge. As a consequence, they block the pathway of the substrate toward the active site. This theoretical approach helps to understand the mechanism responsible for the inhibitory effects of polyamines on AChE activity observed experimentally.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"9 1","pages":""},"PeriodicalIF":5.6,"publicationDate":"2026-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147483484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}