Pub Date : 2025-11-25DOI: 10.1007/s10822-025-00719-9
Juda Baikété, Alhadji Malloum, Jeanet Conradie
The pKa, also known as the logarithmic dissociation constant, is a crucial parameter that defines the ionization level of a molecule when it is in solution. It is essential for several physicochemical properties, including lipophilicity, solubility, protein binding affinity, and the ability to cross biological membranes. Therefore, obtaining accurate pKa assessments is vital for modifying and refining the acidity and basicity of organic compounds. Accurate prediction can help improve drug design, optimize pharmaceutical formulations, analyze the behavior of pollutants in the environment, and guide the development of new materials. Traditionally, pKa determination has relied on experimental techniques. However, the recent emergence of machine learning (ML) has led to significant advances in pKa prediction. In this review, we examine various approaches for pKa prediction, with a focus on recent advances in machine learning. We discuss the performance of these models, drawing on results reported in publications related to the SAMPL Challenges and Novartis prediction challenges. Because of their different theoretical and computational frameworks, protein pKa prediction methods are not included in this review, which focuses exclusively on small organic molecules. Finally, we highlight current challenges and future directions, including the integration of hybrid models combining quantum mechanics and machine learning, the improvement of benchmark databases, and the development of more universal and interpretable predictive models. We hope that this paper can provide useful guidelines for future research.
{"title":"pKa prediction for small molecules: an overview of experimental, quantum, and machine learning-based approaches","authors":"Juda Baikété, Alhadji Malloum, Jeanet Conradie","doi":"10.1007/s10822-025-00719-9","DOIUrl":"10.1007/s10822-025-00719-9","url":null,"abstract":"<p>The pKa, also known as the logarithmic dissociation constant, is a crucial parameter that defines the ionization level of a molecule when it is in solution. It is essential for several physicochemical properties, including lipophilicity, solubility, protein binding affinity, and the ability to cross biological membranes. Therefore, obtaining accurate pKa assessments is vital for modifying and refining the acidity and basicity of organic compounds. Accurate prediction can help improve drug design, optimize pharmaceutical formulations, analyze the behavior of pollutants in the environment, and guide the development of new materials. Traditionally, pKa determination has relied on experimental techniques. However, the recent emergence of machine learning (ML) has led to significant advances in pKa prediction. In this review, we examine various approaches for pKa prediction, with a focus on recent advances in machine learning. We discuss the performance of these models, drawing on results reported in publications related to the SAMPL Challenges and Novartis prediction challenges. Because of their different theoretical and computational frameworks, protein pKa prediction methods are not included in this review, which focuses exclusively on small organic molecules. Finally, we highlight current challenges and future directions, including the integration of hybrid models combining quantum mechanics and machine learning, the improvement of benchmark databases, and the development of more universal and interpretable predictive models. We hope that this paper can provide useful guidelines for future research.</p>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"40 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antimicrobial Resistance (AMR) is a global concern demanding high-throughput and precise AMR surveillance strategies. This review provides a comprehensive list of Artificial Intelligence (AI) driven frameworks widely employed in the early detection, structural characterization, and designing of novel inhibitors to block the resistance pathways critical for AMR. Deep learning algorithms including DeepGO, DeepGOPlus, DeepGO-SE, PFresGO, DPFunc, ProtENN and graph-based architectures of GraphSite, GrASP enables precise functional annotation of resistance-associated proteins. AI-guided protein modeling performed by AlphaFold, RoseTTAFold, ProtGPT-2, ESMFold etc. generates high resolution 3D conformations, further utilized in performing molecular docking via tools like AutoDock, DeepDocking and DeepChem and analyzed with tools like DeepDriveMD, TorchMD, and PRITHVI, which can perform real-time molecular dynamics simulations. Identification of relevant resistant biomarkers from mass-spectrometry profiles can also be achieved with the help of DeepNovo, Casanovo, or Prosit. Tools like DeepARG, HMD-ARG, and BacEffluxPred enables identification of unannotated resistance genes from metagenomic samples. Natural Language Processing (NLP) and Large Language-based models (LLM) facilitate identification of resistant determinants via literature mining enabling regulatory network mapping and rational inhibitor design. Furthermore, AI-mediated de-novo inhibitor design is achieved using Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), diffusion and flow-matching based frameworks serve as potential options for enhancing diagnostic interventions against resistant phenotypes. AI-based protein–protein interaction predictors include DeepInteract, Pred_PPI, PLIP, DeepAIPs-Pred, DeepAIPs-SFLA, SBSM-Pro, Deep Stacked-AVPs, and pNPs-CapsNet help in understanding how resistance proteins interact with each other enabling precise identification of AMR-modulating peptides and supports the modeling of novel antibiotics for blocking interactions and disrupting resistance pathways.
{"title":"Artificial intelligence in protein-based detection and inhibition of AMR pathways","authors":"Suchandrima Sadhukhan, Rupsa Bhattacharya, Debasmita Bhattcharya, Sudipta Sahana, Buddhadeb Pradhan, Soumya Pandit, Harjot Singh Gill, Mithul Rajeev, Moupriya Nag, Dibyajit Lahiri","doi":"10.1007/s10822-025-00710-4","DOIUrl":"10.1007/s10822-025-00710-4","url":null,"abstract":"<div><p>Antimicrobial Resistance (AMR) is a global concern demanding high-throughput and precise AMR surveillance strategies. This review provides a comprehensive list of Artificial Intelligence (AI) driven frameworks widely employed in the early detection, structural characterization, and designing of novel inhibitors to block the resistance pathways critical for AMR. Deep learning algorithms including DeepGO, DeepGOPlus, DeepGO-SE, PFresGO, DPFunc, ProtENN and graph-based architectures of GraphSite, GrASP enables precise functional annotation of resistance-associated proteins. AI-guided protein modeling performed by AlphaFold, RoseTTAFold, ProtGPT-2, ESMFold etc. generates high resolution 3D conformations, further utilized in performing molecular docking via tools like AutoDock, DeepDocking and DeepChem and analyzed with tools like DeepDriveMD, TorchMD, and PRITHVI, which can perform real-time molecular dynamics simulations. Identification of relevant resistant biomarkers from mass-spectrometry profiles can also be achieved with the help of DeepNovo, Casanovo, or Prosit. Tools like DeepARG, HMD-ARG, and BacEffluxPred enables identification of unannotated resistance genes from metagenomic samples. Natural Language Processing (NLP) and Large Language-based models (LLM) facilitate identification of resistant determinants via literature mining enabling regulatory network mapping and rational inhibitor design. Furthermore, AI-mediated de-novo inhibitor design is achieved using Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), diffusion and flow-matching based frameworks serve as potential options for enhancing diagnostic interventions against resistant phenotypes. AI-based protein–protein interaction predictors include DeepInteract, Pred_PPI, PLIP, DeepAIPs-Pred, DeepAIPs-SFLA, SBSM-Pro, Deep Stacked-AVPs, and pNPs-CapsNet help in understanding how resistance proteins interact with each other enabling precise identification of AMR-modulating peptides and supports the modeling of novel antibiotics for blocking interactions and disrupting resistance pathways.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"40 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1007/s10822-025-00708-y
Amílcar Duque-Prata, Carlos Serpa, Pedro J. S. B. Caridade
This study introduces a simple and computationally efficient protocol for estimating the values of aqueous (text{p}K_a) in three major classes of functional groups: organic acids, alcohols, and amines. Although direct density functional theory calculations yielded notable discrepancies from experimental values, the application of class-specific linear calibration significantly improved predictive accuracy. The correlation coefficients increased from 0.67 (uncalibrated) to 0.98 (calibrated), with mean absolute errors of 0.51, 0.69 and 0.37 (text{p}K_a) units for acids, alcohols, and amines, respectively. The observed class-dependent linear trends validate the chemical consistency of the approach, even in the presence of structural diversity. Correlation analysis showed that predictive errors are largely uncorrelated with standard molecular descriptors, indicating that model performance is predominantly governed by the functional group of the ionizable proton. By avoiding subclass distinctions and relying solely on functional group identity, the method maintains simplicity and broad applicability without sacrificing accuracy. Most predictions fall within (pm 0.75)(text{p}K_a) units, supporting the robustness of the protocol. The approach offers a practical framework for systematic estimation of aqueous (text{p}K_a), which is a compelling option for routine prediction of aqueous (text{p}K_a) in various chemical contexts.
{"title":"Calibrating the gap: a user-friendly aqueous (text{p}K_a) prediction protocol for organic acids, alcohols, and amines","authors":"Amílcar Duque-Prata, Carlos Serpa, Pedro J. S. B. Caridade","doi":"10.1007/s10822-025-00708-y","DOIUrl":"10.1007/s10822-025-00708-y","url":null,"abstract":"<div><p>This study introduces a simple and computationally efficient protocol for estimating the values of aqueous <span>(text{p}K_a)</span> in three major classes of functional groups: organic acids, alcohols, and amines. Although direct density functional theory calculations yielded notable discrepancies from experimental values, the application of class-specific linear calibration significantly improved predictive accuracy. The correlation coefficients increased from 0.67 (uncalibrated) to 0.98 (calibrated), with mean absolute errors of 0.51, 0.69 and 0.37 <span>(text{p}K_a)</span> units for acids, alcohols, and amines, respectively. The observed class-dependent linear trends validate the chemical consistency of the approach, even in the presence of structural diversity. Correlation analysis showed that predictive errors are largely uncorrelated with standard molecular descriptors, indicating that model performance is predominantly governed by the functional group of the ionizable proton. By avoiding subclass distinctions and relying solely on functional group identity, the method maintains simplicity and broad applicability without sacrificing accuracy. Most predictions fall within <span>(pm 0.75)</span> <span>(text{p}K_a)</span> units, supporting the robustness of the protocol. The approach offers a practical framework for systematic estimation of aqueous <span>(text{p}K_a)</span>, which is a compelling option for routine prediction of aqueous <span>(text{p}K_a)</span> in various chemical contexts.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"40 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1007/s10822-025-00707-z
Muhammad Tayyab, Khalid Mahmood, Khawar Abbas, Farhan Siddique, Nastaran Sadeghian, Halil Şenol, Maryam Bashir, Parham Taslimi, Abdullah K. Alanazi, Mostafa A. Ismail, Xianliang Zhao, Zahid Shafiq
A novel series of thiosemicarbazone derivatives 6(a–i), synthesized from 4-formyl-2-nitrophenyl quinoline-8-sulfonate, was evaluated for its antidiabetic potential. Among them, compound 6i (IC₅₀ = 54.51 ± 0.84 µM) displayed the most potent α-glucosidase inhibition, whereas 6e (IC₅₀ = 9.66 ± 0.14 µM) exhibited superior α-amylase inhibition, indicating their dual therapeutic potential against key carbohydrate-hydrolyzing enzymes implicated in postprandial hyperglycemia. These derivatives showed structural diversity with potent and selective inhibition profiles. Structure-activity relationship analysis revealed that electron-withdrawing substituents enhanced enzyme affinity and biological activity. However, molecular docking studies demonstrated strong binding affinities for compounds 6f and 6b with docking scores of − 9.1 to − 10.4 kcal/mol against target proteins, via hydrogen bonding and π–π interactions with catalytic residues. Furthermore, in-silico ADMET evaluation predicted good oral bioavailability, low toxicity, and favorable pharmacokinetic properties. The Density Functional Theory (DFT) calculations supported experimental results, where studied compounds showed lower HOMO-LUMO energy gaps (2.41–3.42 eV), suggesting their significant chemical reactivity and molecular stability of these compounds. Overall, in-vitro and in-silico studies revealed that compounds 6b, 6f, 6e, and 6i emerged as promising lead molecules for developing dual-action therapeutic agents targeting hyperglycemia and oxidative damage in diabetes management.
{"title":"Design, synthesis, pharmacological evaluation and computational modeling of 4-formyl-2-nitrophenyl quinoline-8-sulfonate derived thiosemicarbazones as antidiabetic agents","authors":"Muhammad Tayyab, Khalid Mahmood, Khawar Abbas, Farhan Siddique, Nastaran Sadeghian, Halil Şenol, Maryam Bashir, Parham Taslimi, Abdullah K. Alanazi, Mostafa A. Ismail, Xianliang Zhao, Zahid Shafiq","doi":"10.1007/s10822-025-00707-z","DOIUrl":"10.1007/s10822-025-00707-z","url":null,"abstract":"<div><p>A novel series of thiosemicarbazone derivatives <b>6(a–i)</b>, synthesized from 4-formyl-2-nitrophenyl quinoline-8-sulfonate, was evaluated for its antidiabetic potential. Among them, compound <b>6i</b> (IC₅₀ = 54.51 ± 0.84 µM) displayed the most potent α-glucosidase inhibition, whereas <b>6e</b> (IC₅₀ = 9.66 ± 0.14 µM) exhibited superior α-amylase inhibition, indicating their dual therapeutic potential against key carbohydrate-hydrolyzing enzymes implicated in postprandial hyperglycemia. These derivatives showed structural diversity with potent and selective inhibition profiles. Structure-activity relationship analysis revealed that electron-withdrawing substituents enhanced enzyme affinity and biological activity. However, molecular docking studies demonstrated strong binding affinities for compounds <b>6f</b> and <b>6b</b> with docking scores of − 9.1 to − 10.4 kcal/mol against target proteins, via hydrogen bonding and π–π interactions with catalytic residues. Furthermore, in-silico ADMET evaluation predicted good oral bioavailability, low toxicity, and favorable pharmacokinetic properties. The Density Functional Theory (DFT) calculations supported experimental results, where studied compounds showed lower HOMO-LUMO energy gaps (2.41–3.42 eV), suggesting their significant chemical reactivity and molecular stability of these compounds. Overall, in-vitro and in-silico studies revealed that compounds <b>6b</b>, <b>6f</b>, <b>6e</b>, and <b>6i</b> emerged as promising lead molecules for developing dual-action therapeutic agents targeting hyperglycemia and oxidative damage in diabetes management.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"40 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1007/s10822-025-00709-x
S. Sam Jaikumar, S. Mary Praveena
Early Breast Cancer (BC) Diagnosis has the potential to cut BC death rates in the long term drastically. Identifying early-stage cancer cells is the most crucial step in determining the best prognosis. Despite recent advances in the use of AI-based methods, such as machine learning and deep learning (DL), to detect breast cancer, current models are generally limited to simple binary classification of data, rely on a single source of data, and lack transparency, thereby limiting their clinical applicability. To overcome these limitations, we proposed an Explainable Artificial Intelligence (AI)-based Residual Tabular Network (ResTab Net) model based on integrating histopathological images and molecular protein expression data patterns to conduct multimodal BC diagnosis. The proposed model utilizes Adaptive Tissue-Aware Gaussian Filtering (ATGF) to enhance the image, Entropy Enhanced Graph-Watershed Segmentation (EGWS) to clearly define the tumor’s location, and Self-Adaptive Starfish Optimization (SASFO) to select the features. A hybrid framework of residual convolutional blocks and dense layers can facilitate successful multiclass classification. To ensure tangible transparency and clinical trust, the model captures SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) approaches illustrates the impact of molecular protein levels, including image features on classification results. The proposed Explainable AI-ResTab Net model is implemented using Python. The performance evolution of the proposed model achieves an accuracy of 98.56%, a precision of 98.10%, a recall of 98.00%, an F1-score of 98.03%, and an Area Under the Curve (AUC) of 99.60%.
{"title":"Breast cancer diagnosis from histopathological images and molecular signatures by fusing features with an explainable AI-based residual tabular network model","authors":"S. Sam Jaikumar, S. Mary Praveena","doi":"10.1007/s10822-025-00709-x","DOIUrl":"10.1007/s10822-025-00709-x","url":null,"abstract":"<div><p>Early Breast Cancer (BC) Diagnosis has the potential to cut BC death rates in the long term drastically. Identifying early-stage cancer cells is the most crucial step in determining the best prognosis. Despite recent advances in the use of AI-based methods, such as machine learning and deep learning (DL), to detect breast cancer, current models are generally limited to simple binary classification of data, rely on a single source of data, and lack transparency, thereby limiting their clinical applicability. To overcome these limitations, we proposed an Explainable Artificial Intelligence (AI)-based Residual Tabular Network (ResTab Net) model based on integrating histopathological images and molecular protein expression data patterns to conduct multimodal BC diagnosis. The proposed model utilizes Adaptive Tissue-Aware Gaussian Filtering (ATGF) to enhance the image, Entropy Enhanced Graph-Watershed Segmentation (EGWS) to clearly define the tumor’s location, and Self-Adaptive Starfish Optimization (SASFO) to select the features. A hybrid framework of residual convolutional blocks and dense layers can facilitate successful multiclass classification. To ensure tangible transparency and clinical trust, the model captures SHapley Additive exPlanations (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME) approaches illustrates the impact of molecular protein levels, including image features on classification results. The proposed Explainable AI-ResTab Net model is implemented using Python. The performance evolution of the proposed model achieves an accuracy of 98.56%, a precision of 98.10%, a recall of 98.00%, an F1-score of 98.03%, and an Area Under the Curve (AUC) of 99.60%.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"40 1","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145584947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurately predicting polymer density from SMILES strings remains challenging due to the small size, high noise, and chemically diversity of typical datasets. We introduce LiteBoost, a deliberately minimalist gradient boosting model that employs shallow, three-level symmetric trees and exposes only two tunable hyperparameters (n_estimators and learning_rate). Using a curated dataset of 613 polymers, we benchmark LiteBoost against ExtraTrees, XGBoost, LightGBM, and CatBoost, optimizing each with 100–1000 Optuna trials and evaluating performance across seven complementary metrics: R2, RMSE, MAE, median AE, MAPE, maximum error, and explained variance. LiteBoost achieves a MAE of 0.031 g/cm3, RMSE of 0.062 g/cm3, R2 of 0.81, and MAPE of 3.03%, all within 2–3% of the best-in-class CatBoost and XGBoost scores and well within the bounds of experimental uncertainty. Crucially, it does so with orders-of-magnitude fewer hyperparameters. These results demonstrates that a streamlined boosting model can rival heavyweight ensembles in accuracy while dramatically reducing tuning effort, computational cost, and interpretability barriers. LiteBoost is thus a practical first-line surrogate model for high-throughput polymer screening and inverse-design workflows where speed, robustness, and transparency are as critical as raw predictive power.
{"title":"LiteBoost: a lightweight and explainable boosting model for predicting polymer density from SMILES data","authors":"Tuan Nguyen-Sy, Hieu Do-Trung, Nam Nguyen-Hoang, Duc Toan Truong, My-Kristyna Nguyen-Thao","doi":"10.1007/s10822-025-00693-2","DOIUrl":"10.1007/s10822-025-00693-2","url":null,"abstract":"<div><p>Accurately predicting polymer density from SMILES strings remains challenging due to the small size, high noise, and chemically diversity of typical datasets. We introduce LiteBoost, a deliberately minimalist gradient boosting model that employs shallow, three-level symmetric trees and exposes only two tunable hyperparameters (<i>n_estimators</i> and <i>learning_rate</i>). Using a curated dataset of 613 polymers, we benchmark LiteBoost against ExtraTrees, XGBoost, LightGBM, and CatBoost, optimizing each with 100–1000 Optuna trials and evaluating performance across seven complementary metrics: R<sup>2</sup>, RMSE, MAE, median AE, MAPE, maximum error, and explained variance. LiteBoost achieves a MAE of 0.031 g/cm<sup>3</sup>, RMSE of 0.062 g/cm<sup>3</sup>, R<sup>2</sup> of 0.81, and MAPE of 3.03%, all within 2–3% of the best-in-class CatBoost and XGBoost scores and well within the bounds of experimental uncertainty. Crucially, it does so with orders-of-magnitude fewer hyperparameters. These results demonstrates that a streamlined boosting model can rival heavyweight ensembles in accuracy while dramatically reducing tuning effort, computational cost, and interpretability barriers. LiteBoost is thus a practical first-line surrogate model for high-throughput polymer screening and inverse-design workflows where speed, robustness, and transparency are as critical as raw predictive power.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"39 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145510915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1007/s10822-025-00705-1
Dileep Kumar Murala
Deep generative models may detect novel compounds with favourable features, exhibiting chemical design potential. Traditional single-stage variational autoencoders (VAEs) lack validity, uniqueness, and biologically meaningful distribution alignment. It is difficult to represent global molecular architecture and chemical properties in a single latent representation. To overcome these challenges, we offer a multi-stage VAE system that encodes and decodes molecular representations in sequence. Improvements to latent space retain structural integrity while also adding innovation and distinction. Validity, originality, novelty, Fréchet ChemNet Distance (FCD), and KL divergence are used to validate the methodology with ChEMBL and polymer datasets. The bioefficacy of EGFR inhibitors is evaluated using computational Chemprop-based QSAR models. We offer adaptive fine-tuning strategies for the inner-layer (IL) and outer-layer (OL) to improve generating accuracy. IL adaptability is most suited to active compounds. Quantitative evaluations indicate consistent gains in validity, novelty, and biological activity over strong baselines (for example, MoLeR and RationaleRL). We give MNIST tests that confirm the hierarchical training method’s stability but not its scalability beyond molecular tasks, ensuring cross-domain applicability. For generative drug discovery, hierarchical latent models with a multi-stage VAE are advised.
{"title":"Multi-stage variational autoencoders for hierarchical molecular generation and activity optimization","authors":"Dileep Kumar Murala","doi":"10.1007/s10822-025-00705-1","DOIUrl":"10.1007/s10822-025-00705-1","url":null,"abstract":"<div><p>Deep generative models may detect novel compounds with favourable features, exhibiting chemical design potential. Traditional single-stage variational autoencoders (VAEs) lack validity, uniqueness, and biologically meaningful distribution alignment. It is difficult to represent global molecular architecture and chemical properties in a single latent representation. To overcome these challenges, we offer a multi-stage VAE system that encodes and decodes molecular representations in sequence. Improvements to latent space retain structural integrity while also adding innovation and distinction. Validity, originality, novelty, Fréchet ChemNet Distance (FCD), and KL divergence are used to validate the methodology with ChEMBL and polymer datasets. The bioefficacy of EGFR inhibitors is evaluated using computational Chemprop-based QSAR models. We offer adaptive fine-tuning strategies for the inner-layer (IL) and outer-layer (OL) to improve generating accuracy. IL adaptability is most suited to active compounds. Quantitative evaluations indicate consistent gains in validity, novelty, and biological activity over strong baselines (for example, MoLeR and RationaleRL). We give MNIST tests that confirm the hierarchical training method’s stability but not its scalability beyond molecular tasks, ensuring cross-domain applicability. For generative drug discovery, hierarchical latent models with a multi-stage VAE are advised.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"39 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145494076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1007/s10822-025-00701-5
Gabriela L. Borosky
Quantum-mechanical (QM) methods were applied to compute the relative binding energies of a set of structurally similar alkaline phosphatase (AP) inhibitors, using human placental AP (PLAP) as a model AP. The theoretical binding affinities were compared with their corresponding experimental inhibitory potencies. The calculated interaction energies reproduced the experimental activity order, showing linear correlations between QM relative binding energies and experimental pIC50 values with coefficients of determination R2 = 0.86–0.97. Examination of the binding interactions for the test inhibitors revealed that the AP inhibitory activity is determined by the catechol group and the benzimidazole/imidazole moieties of the ligands. The studied compounds formed protein-ligand complexes inside the active site of PLAP, suggesting they are competitive inhibitors. The present theoretical results are expected to be useful in developing new potent AP inhibitors. The employed computational approach for estimating QM protein − ligand interaction energies is proposed as a suitable drug design tool for predicting reliable QM relative binding affinities of structurally related compounds.
{"title":"Evaluation of Protein-Ligand binding interactions of alkaline phosphatase inhibitors by Quantum-Mechanical methods","authors":"Gabriela L. Borosky","doi":"10.1007/s10822-025-00701-5","DOIUrl":"10.1007/s10822-025-00701-5","url":null,"abstract":"<div><p>Quantum-mechanical (QM) methods were applied to compute the relative binding energies of a set of structurally similar alkaline phosphatase (AP) inhibitors, using human placental AP (PLAP) as a model AP. The theoretical binding affinities were compared with their corresponding experimental inhibitory potencies. The calculated interaction energies reproduced the experimental activity order, showing linear correlations between QM relative binding energies and experimental pIC<sub>50</sub> values with coefficients of determination R<sup>2</sup> = 0.86–0.97. Examination of the binding interactions for the test inhibitors revealed that the AP inhibitory activity is determined by the catechol group and the benzimidazole/imidazole moieties of the ligands. The studied compounds formed protein-ligand complexes inside the active site of PLAP, suggesting they are competitive inhibitors. The present theoretical results are expected to be useful in developing new potent AP inhibitors. The employed computational approach for estimating QM protein − ligand interaction energies is proposed as a suitable drug design tool for predicting reliable QM relative binding affinities of structurally related compounds.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"39 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145494131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1007/s10822-025-00698-x
Kashif Iqbal Sahibzada, Rizwan Abid, Haseeb Nisar, Reham A. Abd El Rahman, Muhammad Idrees, Dong-Qing Wei, Yuansen Hu, Saima Sadaf
Pakistan currently holds the second-highest prevalence rate of Hepatitis C virus (HCV) globally. It makes it crucial to continuously monitor the circulating genotypes in the population, especially among the people who inject drugs (PWIDs), as they pose a significant risk of spreading new genotypes in the population. To address this issue, we identified the circulating HCV genotypes among PWIDs and non-PWIDs through Next Generation Sequencing (NGS). Additionally, a multi-epitope vaccine was designed through an immunoinformatic approach using NGS and Sanger sequencing results. The study indicated genotype 3a as the most prevalent genotype among the 61 HCV cases tested through NGS, followed by genotype 1a. The non-allergic and highly antigenic epitopes from both MHC Class-I and Class-II epitopes were retreived from non-structural proteins. Furthermore, B-cell epitopes were retrieved from the E2 protein. The selected epitopes showed 88.26% population coverage rate. Based on large conformational simulation analysis from NMSims, four best constructs suitable for vaccine design were further evaluated for their binding energies through all-atom molecular dynamics simulations and the MMGBSA. One of the constructs showed a low binding energy value with MHC, indicating its potential as a vaccine candidate. However, further experimental work is required to determine its efficacy and safety profile. This research emphasizes the promise of combining multiepitope vaccine design advanced computational methods to accelerate and improve vaccine development thereby filling a crucial gap in the fight against rising antibiotic resistance.
{"title":"HCV genotyping and rational computational designing of an immunogenic multiepitope vaccine against genotype 3a","authors":"Kashif Iqbal Sahibzada, Rizwan Abid, Haseeb Nisar, Reham A. Abd El Rahman, Muhammad Idrees, Dong-Qing Wei, Yuansen Hu, Saima Sadaf","doi":"10.1007/s10822-025-00698-x","DOIUrl":"10.1007/s10822-025-00698-x","url":null,"abstract":"<div><p>Pakistan currently holds the second-highest prevalence rate of Hepatitis C virus (HCV) globally. It makes it crucial to continuously monitor the circulating genotypes in the population, especially among the people who inject drugs (PWIDs), as they pose a significant risk of spreading new genotypes in the population. To address this issue, we identified the circulating HCV genotypes among PWIDs and non-PWIDs through Next Generation Sequencing (NGS). Additionally, a multi-epitope vaccine was designed through an immunoinformatic approach using NGS and Sanger sequencing results. The study indicated genotype 3a as the most prevalent genotype among the 61 HCV cases tested through NGS, followed by genotype 1a. The non-allergic and highly antigenic epitopes from both MHC Class-I and Class-II epitopes were retreived from non-structural proteins. Furthermore, B-cell epitopes were retrieved from the E2 protein. The selected epitopes showed 88.26% population coverage rate. Based on large conformational simulation analysis from NMSims, four best constructs suitable for vaccine design were further evaluated for their binding energies through all-atom molecular dynamics simulations and the MMGBSA. One of the constructs showed a low binding energy value with MHC, indicating its potential as a vaccine candidate. However, further experimental work is required to determine its efficacy and safety profile. This research emphasizes the promise of combining multiepitope vaccine design advanced computational methods to accelerate and improve vaccine development thereby filling a crucial gap in the fight against rising antibiotic resistance.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"39 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145480425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
FYN, a member of the Src family kinases (SFKs) and a non-receptor tyrosine kinase, plays a critical role in signal transduction within the nervous system and is instrumental in the activation and development of T lymphocytes. While the biological significance of FYN kinase in various cellular processes is well recognized, its potential as a therapeutic target remains largely unexplored. In this study, we investigated the potential of natural products (NPs) as preferential inhibitors of FYN kinase. A library of over 3500 NPs was screened for binding affinity with FYN kinase (PDB: 2DQ7) using XGlide docking simulations. The fourteen NPs with the highest docking scores were selected for further analysis. Their interactions with FYN kinase were evaluated through MM-GBSA calculations, and ADMET profiling was performed using SwissADME and pkCSM tools to assess pharmacokinetic properties. Molecular dynamics (MD) simulations using Desmond further confirmed the stability of FYN-NP complexes in solvent environments. Of the top fourteen NPs, only oroxin A demonstrated favorable drug-like properties and sustained stable binding to FYN kinase, as evidenced by MD simulations. Moreover, in vitro kinase inhibition assays revealed that oroxin A exhibited dose-dependent inhibition of FYN kinase. Additionally, C. elegans viability assays confirmed its low toxicity. Moreover, cross-docking revealed that although oroxin A binds to multiple SFKs due to conserved ATP binding pocket, it displayed stronger binding toward FYN, suggesting binding preference over FYN. This study provides a comprehensive evaluation of NPs as potential FYN kinase inhibitors and identifies oroxin A as a natural compound with preliminary evidence of FYN inhibition, warranting further validation.
{"title":"Structure-based identification and experimental evaluation of Oroxin A as a FYN kinase inhibitor","authors":"Vipul Agarwal, Chaitany Jayprakash Raorane, Anugya Gupta, Divya Shastri, Vinit Raj, Sangkil Lee","doi":"10.1007/s10822-025-00700-6","DOIUrl":"10.1007/s10822-025-00700-6","url":null,"abstract":"<div><p>FYN, a member of the Src family kinases (SFKs) and a non-receptor tyrosine kinase, plays a critical role in signal transduction within the nervous system and is instrumental in the activation and development of T lymphocytes. While the biological significance of FYN kinase in various cellular processes is well recognized, its potential as a therapeutic target remains largely unexplored. In this study, we investigated the potential of natural products (NPs) as preferential inhibitors of FYN kinase. A library of over 3500 NPs was screened for binding affinity with FYN kinase (PDB: 2DQ7) using XGlide docking simulations. The fourteen NPs with the highest docking scores were selected for further analysis. Their interactions with FYN kinase were evaluated through MM-GBSA calculations, and ADMET profiling was performed using SwissADME and pkCSM tools to assess pharmacokinetic properties. Molecular dynamics (MD) simulations using Desmond further confirmed the stability of FYN-NP complexes in solvent environments. Of the top fourteen NPs, only oroxin A demonstrated favorable drug-like properties and sustained stable binding to FYN kinase, as evidenced by MD simulations. Moreover, in vitro kinase inhibition assays revealed that oroxin A exhibited dose-dependent inhibition of FYN kinase. Additionally, C. elegans viability assays confirmed its low toxicity. Moreover, cross-docking revealed that although oroxin A binds to multiple SFKs due to conserved ATP binding pocket, it displayed stronger binding toward FYN, suggesting binding preference over FYN. This study provides a comprehensive evaluation of NPs as potential FYN kinase inhibitors and identifies oroxin A as a natural compound with preliminary evidence of FYN inhibition, warranting further validation.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"39 2","pages":""},"PeriodicalIF":3.1,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145480477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}