Pub Date : 2025-03-24Epub Date: 2025-03-10DOI: 10.1021/acs.jcim.4c01997
Artur Bille, Victor Buchstaber, Evgeny Spodarev
Fullerenes are hollow carbon molecules where each atom is connected to exactly three other atoms, arranged in pentagonal and hexagonal rings. Mathematically, they can be combinatorially modeled as planar, 3-regular graphs with facets composed only of pentagons and hexagons. In this work, we outline a few of the many open questions about fullerenes, beginning with the problem of generating fullerenes randomly. We then introduce an infinite family of fullerenes on which the generalized Stone-Wales operation is inapplicable. Furthermore, we present numerical insights into a graph invariant, called the character of a fullerene, derived from its adjacency and degree matrices. As supported by numerical results, this descriptor may lead to a new method for linear enumeration of all fullerenes.
{"title":"Some Open Mathematical Problems on Fullerenes.","authors":"Artur Bille, Victor Buchstaber, Evgeny Spodarev","doi":"10.1021/acs.jcim.4c01997","DOIUrl":"10.1021/acs.jcim.4c01997","url":null,"abstract":"<p><p>Fullerenes are hollow carbon molecules where each atom is connected to exactly three other atoms, arranged in pentagonal and hexagonal rings. Mathematically, they can be combinatorially modeled as planar, 3-regular graphs with facets composed only of pentagons and hexagons. In this work, we outline a few of the many open questions about fullerenes, beginning with the problem of generating fullerenes randomly. We then introduce an infinite family of fullerenes on which the generalized Stone-Wales operation is inapplicable. Furthermore, we present numerical insights into a graph invariant, called the <i>character</i> of a fullerene, derived from its adjacency and degree matrices. As supported by numerical results, this descriptor may lead to a new method for linear enumeration of all fullerenes.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2911-2923"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11938279/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143595805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24DOI: 10.1021/acs.jcim.5c0012610.1021/acs.jcim.5c00126
Minghao Liu, Kaiyu Wang, Yan Zhang, Xue Zhou, Wannan Li* and Weiwei Han*,
Bioactive peptides from food sources offer a safe and biocompatible approach to enzyme inhibition, with potential applications in managing metabolic disorders such as hyperuricemia and gout, conditions linked to excessive xanthine oxidase activity. Using a machine learning-based screening approach inspired by the bioactivity of natto, two peptides, ECFK and FECK, were identified from the Bacillus subtilis proteome and validated as xanthine oxidase inhibitors with IC50 values of 37.36 and 71.57 mM, respectively. Further experiments confirmed their safety through cytotoxicity assays, and electronic tongue analysis demonstrated their mild sensory properties, supporting their edibility. Molecular dynamics simulations revealed that these peptides stabilize critical enzyme regions, with ECFK showing a higher dissociation energy barrier (52.08 kcal/mol) than FECK (46.39 kcal/mol), indicating strong, stable interactions. This study highlights food-derived peptides as safe and natural inhibitors of xanthine oxidase, offering promising therapeutic potential for metabolic disorder management.
{"title":"Mechanistic Study of Protein Interaction with Natto Inhibitory Peptides Targeting Xanthine Oxidase: Insights from Machine Learning and Molecular Dynamics Simulations","authors":"Minghao Liu, Kaiyu Wang, Yan Zhang, Xue Zhou, Wannan Li* and Weiwei Han*, ","doi":"10.1021/acs.jcim.5c0012610.1021/acs.jcim.5c00126","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00126https://doi.org/10.1021/acs.jcim.5c00126","url":null,"abstract":"<p >Bioactive peptides from food sources offer a safe and biocompatible approach to enzyme inhibition, with potential applications in managing metabolic disorders such as hyperuricemia and gout, conditions linked to excessive xanthine oxidase activity. Using a machine learning-based screening approach inspired by the bioactivity of natto, two peptides, ECFK and FECK, were identified from the <i>Bacillus subtilis</i> proteome and validated as xanthine oxidase inhibitors with IC<sub>50</sub> values of 37.36 and 71.57 mM, respectively. Further experiments confirmed their safety through cytotoxicity assays, and electronic tongue analysis demonstrated their mild sensory properties, supporting their edibility. Molecular dynamics simulations revealed that these peptides stabilize critical enzyme regions, with ECFK showing a higher dissociation energy barrier (52.08 kcal/mol) than FECK (46.39 kcal/mol), indicating strong, stable interactions. This study highlights food-derived peptides as safe and natural inhibitors of xanthine oxidase, offering promising therapeutic potential for metabolic disorder management.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3682–3696 3682–3696"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24DOI: 10.1021/acs.jcim.4c02338
Tobias M Prass, Kresten Lindorff-Larsen, Patrick Garidel, Michaela Blech, Lars V Schäfer
The high doses of drugs required for biotherapeutics, such as monoclonal antibodies (mAbs), and the small volumes that can be administered to patients by subcutaneous injections pose challenges due to high-concentration formulations. The addition of excipients, such as arginine and glutamate, to high-concentration protein formulations can increase solubility and reduce the tendency of protein particle formation. Molecular dynamics (MD) simulations can provide microscopic insights into the mode of action of excipients in mAb formulations but require large system sizes and long time scales that are currently beyond reach at the fully atomistic level. Computationally efficient coarse-grained models such as the Martini 3 force field can tackle this challenge but require careful parametrization, testing, and validation. This study extends the popular Martini 3 force field toward realistic protein-excipient interactions of arginine and glutamate excipients, using the Fab domains of the therapeutic mAbs trastuzumab and omalizumab as model systems. A novel all-atom to coarse-grained mapping of the amino acid excipients is introduced, which explicitly captures the zwitterionic character of the backbone. The Fab-excipient interactions of arginine and glutamate are characterized concerning molecular contacts with the Fabs at the single-residue level. The Martini 3 simulations are compared with results from all-atom simulations as a reference. Our findings reveal an overestimation of Fab-excipient contacts with the default interaction parameters of Martini 3, suggesting a too strong attraction between protein residues and excipients. Therefore, we reparametrized the protein-excipient interaction parameters in Martini 3 against all-atom simulations. The excipient interactions obtained with the new Martini 3 mapping and Lennard-Jones (LJ) interaction parameters, coined Martini 3-exc, agree closely with the all-atom reference data. This work presents an improved parameter set for mAb-arginine and mAb-glutamate interactions in the Martini 3 coarse-grained force field, a key step toward large-scale coarse-grained MD simulations of high-concentration mAb formulations and the stabilizing effects of excipients.
{"title":"Optimized Protein-Excipient Interactions in the Martini 3 Force Field.","authors":"Tobias M Prass, Kresten Lindorff-Larsen, Patrick Garidel, Michaela Blech, Lars V Schäfer","doi":"10.1021/acs.jcim.4c02338","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02338","url":null,"abstract":"<p><p>The high doses of drugs required for biotherapeutics, such as monoclonal antibodies (mAbs), and the small volumes that can be administered to patients by subcutaneous injections pose challenges due to high-concentration formulations. The addition of excipients, such as arginine and glutamate, to high-concentration protein formulations can increase solubility and reduce the tendency of protein particle formation. Molecular dynamics (MD) simulations can provide microscopic insights into the mode of action of excipients in mAb formulations but require large system sizes and long time scales that are currently beyond reach at the fully atomistic level. Computationally efficient coarse-grained models such as the Martini 3 force field can tackle this challenge but require careful parametrization, testing, and validation. This study extends the popular Martini 3 force field toward realistic protein-excipient interactions of arginine and glutamate excipients, using the Fab domains of the therapeutic mAbs trastuzumab and omalizumab as model systems. A novel all-atom to coarse-grained mapping of the amino acid excipients is introduced, which explicitly captures the zwitterionic character of the backbone. The Fab-excipient interactions of arginine and glutamate are characterized concerning molecular contacts with the Fabs at the single-residue level. The Martini 3 simulations are compared with results from all-atom simulations as a reference. Our findings reveal an overestimation of Fab-excipient contacts with the default interaction parameters of Martini 3, suggesting a too strong attraction between protein residues and excipients. Therefore, we reparametrized the protein-excipient interaction parameters in Martini 3 against all-atom simulations. The excipient interactions obtained with the new Martini 3 mapping and Lennard-Jones (LJ) interaction parameters, coined Martini 3-exc, agree closely with the all-atom reference data. This work presents an improved parameter set for mAb-arginine and mAb-glutamate interactions in the Martini 3 coarse-grained force field, a key step toward large-scale coarse-grained MD simulations of high-concentration mAb formulations and the stabilizing effects of excipients.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143699097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24Epub Date: 2025-03-11DOI: 10.1021/acs.jcim.5c00117
Chuan-Qi Sun, Zhi-Min Li, Yu Ji, Ulrich Schwaneberg, Zong-Lin Li
Expanding enzyme substrate spectra enhances industrial applications and drives sustainable biocatalysis. Despite advances, challenges in modification efficiency and high-throughput screening persist. Here, we developed a virtual screening method called CMDmpnn that combines comparative molecular dynamics (MD) simulations and ProteinMPNN to broaden enzyme substrate spectra without compromising other industrially important properties of enzymes, such as thermostability. Using glycosyltransferase as a model, we first established a dynamic model library of the wild-type enzyme through MD simulations and performed clustering. Subsequently, we utilized ProteinMPNN to generate a comprehensive set of new sequences for the entire library, enabling rapid identification of all possible enzyme variants. Short MD simulations were then conducted on variant-substrate complex models, with results compared to those of the wild-type enzyme. By analyzing catalytically relevant information such as substrate binding modes and key atomic distances, we identified multiple variants capable of catalyzing a broad spectrum of phenolic compounds, all within a timeframe of less than 2 weeks. The CMDmpnn method offers a powerful and efficient tool for rapidly expanding enzyme substrate spectra.
{"title":"CMDmpnn: Combining Comparative Molecular Dynamics and ProteinMPNN to Rapidly Expand Enzyme Substrate Spectrum.","authors":"Chuan-Qi Sun, Zhi-Min Li, Yu Ji, Ulrich Schwaneberg, Zong-Lin Li","doi":"10.1021/acs.jcim.5c00117","DOIUrl":"10.1021/acs.jcim.5c00117","url":null,"abstract":"<p><p>Expanding enzyme substrate spectra enhances industrial applications and drives sustainable biocatalysis. Despite advances, challenges in modification efficiency and high-throughput screening persist. Here, we developed a virtual screening method called CMDmpnn that combines comparative molecular dynamics (MD) simulations and ProteinMPNN to broaden enzyme substrate spectra without compromising other industrially important properties of enzymes, such as thermostability. Using glycosyltransferase as a model, we first established a dynamic model library of the wild-type enzyme through MD simulations and performed clustering. Subsequently, we utilized ProteinMPNN to generate a comprehensive set of new sequences for the entire library, enabling rapid identification of all possible enzyme variants. Short MD simulations were then conducted on variant-substrate complex models, with results compared to those of the wild-type enzyme. By analyzing catalytically relevant information such as substrate binding modes and key atomic distances, we identified multiple variants capable of catalyzing a broad spectrum of phenolic compounds, all within a timeframe of less than 2 weeks. The CMDmpnn method offers a powerful and efficient tool for rapidly expanding enzyme substrate spectra.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2741-2747"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143603034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24Epub Date: 2025-03-11DOI: 10.1021/acs.jcim.4c01529
So Eun Choi, MiYoung Jang, SoHee Yoon, SangHyun Yoo, Jooyeon Ahn, Minho Kim, Ho-Gyeong Kim, Yebin Jung, Seongeon Park, Young-Seok Kim, Taekhoon Kim
The application of large language models in materials science has opened new avenues for accelerating materials development. Building on this advancement, we propose a novel framework leveraging large language models to optimize experimental procedures for synthesizing quantum dot materials with multiple desired properties. Our framework integrates the synthesis protocol generation model and the property prediction model, both fine-tuned on open-source large language models using parameter-efficient training techniques with in-house synthesis protocol data. Once the synthesis protocol with target properties and a masked reference protocol is generated, it undergoes validation through the property prediction models, followed by assessments of its novelty and human evaluation. Our synthesis experiments demonstrate that among the six synthesis protocols derived from the entire framework, three successfully update the Pareto front, and all six improve at least one property. Through empirical validation, we confirm the effectiveness of our fine-tuned large language model-driven framework for synthesis planning, showcasing strong performance under multitarget optimization.
{"title":"LLM-Driven Synthesis Planning for Quantum Dot Materials Development.","authors":"So Eun Choi, MiYoung Jang, SoHee Yoon, SangHyun Yoo, Jooyeon Ahn, Minho Kim, Ho-Gyeong Kim, Yebin Jung, Seongeon Park, Young-Seok Kim, Taekhoon Kim","doi":"10.1021/acs.jcim.4c01529","DOIUrl":"10.1021/acs.jcim.4c01529","url":null,"abstract":"<p><p>The application of large language models in materials science has opened new avenues for accelerating materials development. Building on this advancement, we propose a novel framework leveraging large language models to optimize experimental procedures for synthesizing quantum dot materials with multiple desired properties. Our framework integrates the synthesis protocol generation model and the property prediction model, both fine-tuned on open-source large language models using parameter-efficient training techniques with in-house synthesis protocol data. Once the synthesis protocol with target properties and a masked reference protocol is generated, it undergoes validation through the property prediction models, followed by assessments of its novelty and human evaluation. Our synthesis experiments demonstrate that among the six synthesis protocols derived from the entire framework, three successfully update the Pareto front, and all six improve at least one property. Through empirical validation, we confirm the effectiveness of our fine-tuned large language model-driven framework for synthesis planning, showcasing strong performance under multitarget optimization.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2748-2758"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143603039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24DOI: 10.1021/acs.jcim.4c0225610.1021/acs.jcim.4c02256
Jian Wang, and , Nikolay V. Dokholyan*,
A complex web of intermolecular interactions defines and regulates biological processes. Understanding this web has been particularly challenging because of the sheer number of actors in biological systems: ∼104 proteins in a typical human cell offer plausible 108 interactions. This number grows rapidly if we consider metabolites, drugs, nutrients, and other biological molecules. The relative strength of interactions also critically affects these biological processes. However, the small and often incomplete data sets (103–104 protein–ligand interactions) traditionally used for binding affinity predictions limit the ability to capture the full complexity of these interactions. To overcome this challenge, we developed Yuel 2, a novel neural network-based approach that leverages transfer learning to address the limitations of small data sets. Yuel 2 is pretrained on a large-scale data set to learn intricate structural features and then fine-tuned on specialized data sets like PDBbind to enhance the predictive accuracy and robustness. We show that Yuel 2 predicts multiple binding affinity metrics, Kd, Ki, and IC50, between proteins and small molecules, offering a comprehensive representation of molecular interactions crucial for drug design and development.
{"title":"Leveraging Transfer Learning for Predicting Protein–Small-Molecule Interaction Predictions","authors":"Jian Wang, and , Nikolay V. Dokholyan*, ","doi":"10.1021/acs.jcim.4c0225610.1021/acs.jcim.4c02256","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02256https://doi.org/10.1021/acs.jcim.4c02256","url":null,"abstract":"<p >A complex web of intermolecular interactions defines and regulates biological processes. Understanding this web has been particularly challenging because of the sheer number of actors in biological systems: ∼10<sup>4</sup> proteins in a typical human cell offer plausible 10<sup>8</sup> interactions. This number grows rapidly if we consider metabolites, drugs, nutrients, and other biological molecules. The relative strength of interactions also critically affects these biological processes. However, the small and often incomplete data sets (10<sup>3</sup>–10<sup>4</sup> protein–ligand interactions) traditionally used for binding affinity predictions limit the ability to capture the full complexity of these interactions. To overcome this challenge, we developed Yuel 2, a novel neural network-based approach that leverages transfer learning to address the limitations of small data sets. Yuel 2 is pretrained on a large-scale data set to learn intricate structural features and then fine-tuned on specialized data sets like PDBbind to enhance the predictive accuracy and robustness. We show that Yuel 2 predicts multiple binding affinity metrics, <i>K</i><sub>d</sub>, <i>K</i><sub>i</sub>, and IC<sub>50</sub>, between proteins and small molecules, offering a comprehensive representation of molecular interactions crucial for drug design and development.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"65 7","pages":"3262–3269 3262–3269"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24Epub Date: 2025-03-04DOI: 10.1021/acs.jcim.4c02192
Sayyed Jalil Mahdizadeh, Leif A Eriksson
In the quest for accelerating de novo drug discovery, the development of efficient and accurate scoring functions represents a fundamental challenge. This study introduces iScore, a novel machine learning (ML)-based scoring function designed to predict the binding affinity of protein-ligand complexes with remarkable speed and precision. Uniquely, iScore circumvents the conventional reliance on explicit knowledge of protein-ligand interactions and a full picture of atomic contacts, instead leveraging a set of ligand and binding pocket descriptors to directly evaluate binding affinity. This approach enables skipping the inefficient and slow conformational sampling stage, thereby enabling the rapid screening of ultrahuge molecular libraries, a crucial advancement given the practically infinite dimensions of chemical space. iScore was rigorously trained and validated using the PDBbind 2020 refined set, CASF 2016, CSAR NRC-HiQ Set1/2, DUD-E, and target fishing data sets, employing three distinct ML methodologies: Deep neural network (iScore-DNN), random forest (iScore-RF), and eXtreme gradient boosting (iScore-XGB). A hybrid model, iScore-Hybrid, was subsequently developed to incorporate the strengths of these individual base learners. The hybrid model demonstrated a Pearson correlation coefficient (R) of 0.78 and a root-mean-square error (RMSE) of 1.23 in cross-validation, outperforming the individual base learners and establishing new benchmarks for scoring power (R = 0.814, RMSE = 1.34), ranking power (ρ = 0.705), and screening power (success rate at top 10% = 73.7%). Moreover, iScore-Hybrid demonstrated great performance in the target fishing benchmarking study.
{"title":"iScore: A ML-Based Scoring Function for De Novo Drug Discovery.","authors":"Sayyed Jalil Mahdizadeh, Leif A Eriksson","doi":"10.1021/acs.jcim.4c02192","DOIUrl":"10.1021/acs.jcim.4c02192","url":null,"abstract":"<p><p>In the quest for accelerating de novo drug discovery, the development of efficient and accurate scoring functions represents a fundamental challenge. This study introduces iScore, a novel machine learning (ML)-based scoring function designed to predict the binding affinity of protein-ligand complexes with remarkable speed and precision. Uniquely, iScore circumvents the conventional reliance on explicit knowledge of protein-ligand interactions and a full picture of atomic contacts, instead leveraging a set of ligand and binding pocket descriptors to directly evaluate binding affinity. This approach enables skipping the inefficient and slow conformational sampling stage, thereby enabling the rapid screening of ultrahuge molecular libraries, a crucial advancement given the practically infinite dimensions of chemical space. iScore was rigorously trained and validated using the PDBbind 2020 refined set, CASF 2016, CSAR NRC-HiQ Set1/2, DUD-E, and target fishing data sets, employing three distinct ML methodologies: Deep neural network (iScore-DNN), random forest (iScore-RF), and eXtreme gradient boosting (iScore-XGB). A hybrid model, iScore-Hybrid, was subsequently developed to incorporate the strengths of these individual base learners. The hybrid model demonstrated a Pearson correlation coefficient (<i>R</i>) of 0.78 and a root-mean-square error (RMSE) of 1.23 in cross-validation, outperforming the individual base learners and establishing new benchmarks for scoring power (<i>R</i> = 0.814, RMSE = 1.34), ranking power (ρ = 0.705), and screening power (success rate at top 10% = 73.7%). Moreover, iScore-Hybrid demonstrated great performance in the target fishing benchmarking study.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2759-2772"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11938276/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143555335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24Epub Date: 2025-02-27DOI: 10.1021/acs.jcim.4c02401
Andrés Halabi Diaz, Mario Duque-Noreña, Elizabeth Rincón, Eduardo Chamorro
Nitroaromatic compounds (NAs) are widely used in industrial applications but pose significant genotoxic risks, necessitating accurate mutagenicity prediction for chemical safety assessments. This study integrates conceptual density functional theory (CDFT) descriptors with explainable no-code machine learning (ML) models to predict NA mutagenicity based on Ames test results. Following OECD QSAR guidelines, feature selection and model development were performed using decision-tree-based algorithms (Random Tree, JCHAID*, SPAARC) and multilayer perceptrons (MLPs). These models exhibited high predictive accuracy (internal: >80%, κ = 0.21-0.37; external: ∼90%, κ = 0.41-0.62) with strong interpretability. The study also explores the role of metabolic activation and aqueous-phase descriptors, evaluating a novel electronic analog to LogP (LogQP) to assess hydrophobicity-mutagenicity relationships. Results demonstrate that aqueous-phase electronic properties and electrophilicity descriptors outperform vacuum-based methods in mutagenicity prediction. The combination of CDFT descriptors with shallow ML models proves to be a robust, interpretable, and accessible framework for predictive toxicology. This approach enhances chemical risk assessment and bridges computational chemistry with toxicology for regulatory applications.
硝基芳香族化合物(NAs)被广泛应用于工业领域,但却具有很大的遗传毒性风险,因此需要对其进行准确的致突变性预测,以评估其化学安全性。本研究将概念密度泛函理论(CDFT)描述符与可解释无代码机器学习(ML)模型相结合,根据艾姆斯试验结果预测 NA 的致突变性。按照 OECD QSAR 准则,使用基于决策树的算法(随机树、JCHAID*、SPAARC)和多层感知器(MLP)进行了特征选择和模型开发。这些模型具有很高的预测准确性(内部:>80%,κ = 0.21-0.37;外部:∼90%,κ = 0.41-0.62)和很强的可解释性。该研究还探讨了代谢活化和水相描述因子的作用,评估了 LogP 的新型电子类似物(LogQP),以评估疏水性与致突变性之间的关系。结果表明,在诱变性预测方面,水相电子特性和亲电性描述因子优于基于真空的方法。事实证明,将 CDFT 描述因子与浅层 ML 模型相结合,是一种稳健、可解释且易于使用的预测毒理学框架。这种方法增强了化学风险评估,并为监管应用架起了计算化学与毒理学的桥梁。
{"title":"Predicting the Mutagenic Activity of Nitroaromatics Using Conceptual Density Functional Theory Descriptors and Explainable No-Code Machine Learning Approaches.","authors":"Andrés Halabi Diaz, Mario Duque-Noreña, Elizabeth Rincón, Eduardo Chamorro","doi":"10.1021/acs.jcim.4c02401","DOIUrl":"10.1021/acs.jcim.4c02401","url":null,"abstract":"<p><p>Nitroaromatic compounds (NAs) are widely used in industrial applications but pose significant genotoxic risks, necessitating accurate mutagenicity prediction for chemical safety assessments. This study integrates conceptual density functional theory (CDFT) descriptors with explainable no-code machine learning (ML) models to predict NA mutagenicity based on Ames test results. Following OECD QSAR guidelines, feature selection and model development were performed using decision-tree-based algorithms (Random Tree, JCHAID*, SPAARC) and multilayer perceptrons (MLPs). These models exhibited high predictive accuracy (internal: >80%, κ = 0.21-0.37; external: ∼90%, κ = 0.41-0.62) with strong interpretability. The study also explores the role of metabolic activation and aqueous-phase descriptors, evaluating a novel electronic analog to LogP (LogQP) to assess hydrophobicity-mutagenicity relationships. Results demonstrate that aqueous-phase electronic properties and electrophilicity descriptors outperform vacuum-based methods in mutagenicity prediction. The combination of CDFT descriptors with shallow ML models proves to be a robust, interpretable, and accessible framework for predictive toxicology. This approach enhances chemical risk assessment and bridges computational chemistry with toxicology for regulatory applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2950-2960"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143522070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24Epub Date: 2025-03-11DOI: 10.1021/acs.jcim.5c00043
Yu Li, Lin-Xuan Hou, Hai-Cheng Yi, Zhu-Hong You, Shi-Hong Chen, Jia Zheng, Yang Yuan, Cheng-Gang Mi
Drug-drug interactions influence drug efficacy and patient prognosis, providing substantial research value. Some existing methods struggle with the challenges posed by sparse networks or lack the capability to integrate data from multiple sources. In this study, we propose MOLGAECL, a novel approach based on graph autoencoder pretraining and molecular graph contrastive learning. Initially, a large number of unlabeled molecular graphs are pretrained using a graph autoencoder, where graph contrastive learning is applied for more accurate representation of the drugs. Subsequently, a full-parameter fine-tuning is performed on different data sets to adapt the model for drug interaction-related prediction tasks. To assess the effectiveness of MOLGAECL, comparison experiments with state-of-the-art methods, fine-tuning comparison experiments, and parameter sensitivity analysis are conducted. Extensive experimental results demonstrate the superior performance of MOLGAECL. Specifically, MOLGAECL achieves an average increase of 6.13% in accuracy, 6.14% in AUROC, and 8.16% in AUPRC across all data sets.
{"title":"MOLGAECL: Molecular Graph Contrastive Learning via Graph Auto-Encoder Pretraining and Fine-Tuning Based on Drug-Drug Interaction Prediction.","authors":"Yu Li, Lin-Xuan Hou, Hai-Cheng Yi, Zhu-Hong You, Shi-Hong Chen, Jia Zheng, Yang Yuan, Cheng-Gang Mi","doi":"10.1021/acs.jcim.5c00043","DOIUrl":"10.1021/acs.jcim.5c00043","url":null,"abstract":"<p><p>Drug-drug interactions influence drug efficacy and patient prognosis, providing substantial research value. Some existing methods struggle with the challenges posed by sparse networks or lack the capability to integrate data from multiple sources. In this study, we propose MOLGAECL, a novel approach based on graph autoencoder pretraining and molecular graph contrastive learning. Initially, a large number of unlabeled molecular graphs are pretrained using a graph autoencoder, where graph contrastive learning is applied for more accurate representation of the drugs. Subsequently, a full-parameter fine-tuning is performed on different data sets to adapt the model for drug interaction-related prediction tasks. To assess the effectiveness of MOLGAECL, comparison experiments with state-of-the-art methods, fine-tuning comparison experiments, and parameter sensitivity analysis are conducted. Extensive experimental results demonstrate the superior performance of MOLGAECL. Specifically, MOLGAECL achieves an average increase of 6.13% in accuracy, 6.14% in AUROC, and 8.16% in AUPRC across all data sets.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"3104-3116"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143603041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24Epub Date: 2025-03-10DOI: 10.1021/acs.jcim.4c02214
Yu-Hong Liu, Hong-Quan Xu, Si-Si Zhu, Yan-Feng Hong, Xiu-Wen Li, Hong-Xiu Li, Jun-Peng Xiong, Huan Xiao, Jin-Hui Bu, Feng Zhu, Lin Tao
Viruses are significant human pathogens responsible for pandemic outbreaks and seasonal epidemics. Viral infectious diseases impose a devastating global burden and have a profound impact on public health systems. During viral infections, alternative splicing (AS) plays a crucial role in regulating immune responses, altering the host's cellular environment, expanding viral genetic material, and facilitating viral replication. As research on AS in viral infections expands, it is crucial to consolidate data on virus-related splicing changes to improve our understanding of these viruses and associated diseases. To address this need, we created ASVirus (https://bddg.hznu.edu.cn/asvirus/), a comprehensive database of virus-associated AS events and their regulatory factors. ASVirus uniquely combines high-confidence, experimentally validated splicing data and investigates upstream regulatory mechanisms through a gene-splicing factor interaction network. Its user-friendly web interface offers detailed information into AS events from various viral families and the resulting mis-splicing in host genes, aiding the exploration of novel viral infection mechanisms and the identification of critical therapeutic targets for viral diseases.
{"title":"ASVirus: A Comprehensive Knowledgebase for the Viral Alternative Splicing.","authors":"Yu-Hong Liu, Hong-Quan Xu, Si-Si Zhu, Yan-Feng Hong, Xiu-Wen Li, Hong-Xiu Li, Jun-Peng Xiong, Huan Xiao, Jin-Hui Bu, Feng Zhu, Lin Tao","doi":"10.1021/acs.jcim.4c02214","DOIUrl":"10.1021/acs.jcim.4c02214","url":null,"abstract":"<p><p>Viruses are significant human pathogens responsible for pandemic outbreaks and seasonal epidemics. Viral infectious diseases impose a devastating global burden and have a profound impact on public health systems. During viral infections, alternative splicing (AS) plays a crucial role in regulating immune responses, altering the host's cellular environment, expanding viral genetic material, and facilitating viral replication. As research on AS in viral infections expands, it is crucial to consolidate data on virus-related splicing changes to improve our understanding of these viruses and associated diseases. To address this need, we created ASVirus (https://bddg.hznu.edu.cn/asvirus/), a comprehensive database of virus-associated AS events and their regulatory factors. ASVirus uniquely combines high-confidence, experimentally validated splicing data and investigates upstream regulatory mechanisms through a gene-splicing factor interaction network. Its user-friendly web interface offers detailed information into AS events from various viral families and the resulting mis-splicing in host genes, aiding the exploration of novel viral infection mechanisms and the identification of critical therapeutic targets for viral diseases.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"2722-2729"},"PeriodicalIF":5.6,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143595798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}