Pub Date : 2025-11-19eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1710926
Jack M Craig, Whitney L Fisher, Allan S Thomas, S Blair Hedges, Sudhir Kumar
Afrotheria, the superorder that includes aardvarks, elephants, elephant shrews, hyraxes, manatees, and tenrecs, is home to some of the most charismatic and well-studied animals on Earth. Here, we assemble a nearly taxonomically complete molecular timetree of Afrotheria using an integrative approach that combines a literature search for published timetrees, de novo dating of untimed molecular phylogenies, and inference of timetrees from new alignments. The resulting timetree sheds light on the impact of the Cretaceous-Paleogene (K-Pg) role ∼66 million years ago in the diversification of Afrotherian orders. The earliest divergence in the timetree of Afrotherian mammals predates the K-Pg event by 12 million years, followed by five interordinal divergences that occurred gradually over a 16-million-year period encompassing the K-Pg event.
{"title":"Completing a molecular timetree of Afrotheria.","authors":"Jack M Craig, Whitney L Fisher, Allan S Thomas, S Blair Hedges, Sudhir Kumar","doi":"10.3389/fbinf.2025.1710926","DOIUrl":"10.3389/fbinf.2025.1710926","url":null,"abstract":"<p><p>Afrotheria, the superorder that includes aardvarks, elephants, elephant shrews, hyraxes, manatees, and tenrecs, is home to some of the most charismatic and well-studied animals on Earth. Here, we assemble a nearly taxonomically complete molecular timetree of Afrotheria using an integrative approach that combines a literature search for published timetrees, <i>de novo</i> dating of untimed molecular phylogenies, and inference of timetrees from new alignments. The resulting timetree sheds light on the impact of the Cretaceous-Paleogene (K-Pg) role ∼66 million years ago in the diversification of Afrotherian orders. The earliest divergence in the timetree of Afrotherian mammals predates the K-Pg event by 12 million years, followed by five interordinal divergences that occurred gradually over a 16-million-year period encompassing the K-Pg event.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1710926"},"PeriodicalIF":3.9,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12672906/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145679385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-18eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1716375
Manisha Shah, Sivakumar Arumugam
Introduction: Tumor necrosis factor-alpha (TNF-alpha) is a central mediator of chronic inflammation and a validated therapeutic target in atherosclerosis and related cardiovascular disorders. Peptide therapeutics offer high specificity and low toxicity; however, few natural sequences have been optimized for durable TNF-alpha inhibition.
Methods: A dual in silico strategy was employed to identify potent inhibitors: (i) virtual screening of experimentally validated food-derived bioactive peptides and (ii) rational design of an N-C cyclized and disulfide-bridge peptide based on the TNF-alpha-TNFR1 interface. Molecular docking, 200-ns molecular dynamics simulations, and MM/PBSA free-energy analyses were performed.
Results: The selected peptides exhibited strong and persistent interactions with key TNF-alpha residues, particularly Tyr119. The cyclic analogue demonstrated deeper free-energy minima, higher binding affinity, and more stable hydrogen-bond networks than the linear sequence. ADMET profiling revealed superior metabolic stability, reduced plasma clearance, and no predicted cardiotoxicity.
Discussion: These results indicate that dietary peptides can serve as templates for TNF-alpha inhibition, and interface-guided cyclization rationally enhances stability, binding affinity, and drug-like properties. This study provides a mechanistic framework for developing food-derived peptides as next-generation TNF-alpha antagonists and supports United Nations SDGs 3 and 9 by promoting innovative, low-toxicity therapeutics for chronic inflammation and cardiovascular diseases.
{"title":"Food-derived linear vs. rationally designed cyclic peptides as potent TNF-alpha inhibitors: an integrative computational study.","authors":"Manisha Shah, Sivakumar Arumugam","doi":"10.3389/fbinf.2025.1716375","DOIUrl":"10.3389/fbinf.2025.1716375","url":null,"abstract":"<p><strong>Introduction: </strong>Tumor necrosis factor-alpha (TNF-alpha) is a central mediator of chronic inflammation and a validated therapeutic target in atherosclerosis and related cardiovascular disorders. Peptide therapeutics offer high specificity and low toxicity; however, few natural sequences have been optimized for durable TNF-alpha inhibition.</p><p><strong>Methods: </strong>A dual in silico strategy was employed to identify potent inhibitors: (i) virtual screening of experimentally validated food-derived bioactive peptides and (ii) rational design of an N-C cyclized and disulfide-bridge peptide based on the TNF-alpha-TNFR1 interface. Molecular docking, 200-ns molecular dynamics simulations, and MM/PBSA free-energy analyses were performed.</p><p><strong>Results: </strong>The selected peptides exhibited strong and persistent interactions with key TNF-alpha residues, particularly Tyr119. The cyclic analogue demonstrated deeper free-energy minima, higher binding affinity, and more stable hydrogen-bond networks than the linear sequence. ADMET profiling revealed superior metabolic stability, reduced plasma clearance, and no predicted cardiotoxicity.</p><p><strong>Discussion: </strong>These results indicate that dietary peptides can serve as templates for TNF-alpha inhibition, and interface-guided cyclization rationally enhances stability, binding affinity, and drug-like properties. This study provides a mechanistic framework for developing food-derived peptides as next-generation TNF-alpha antagonists and supports United Nations SDGs 3 and 9 by promoting innovative, low-toxicity therapeutics for chronic inflammation and cardiovascular diseases.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1716375"},"PeriodicalIF":3.9,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12669231/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: Histone-lysine N-methyltransferase 2D (KMT2D) is an H3K4 methyltransferase and a potential tumor suppressor with a crucial role in regulating gene expression. Its dysregulation has been implicated in developmental disorders and several types of cancers. Despite this, the molecular mechanisms that govern its activity remain largely elusive. Among these, post-translational modifications, especially phosphorylation, serve as an essential regulator, fine-tuning KMT2D stability, localization and functional interactions for maintaining cellular homeostasis. With over 173 phosphorylation sites reported, KMT2D is significantly regulated by kinases and exploring its phospho-regulatory network based on targeted in vitro approaches is challenging.
Methods: We systematically curated and integrated the global phosphoproteomic datasets, along with their corresponding experimental conditions, to comprehensively identify the phosphorylation events reported for KMT2D. The site exhibiting the highest frequency of detection across these datasets is considered the predominant phosphorylation site. To investigate its functional significance, we analyzed the proteins and their phosphorylation sites that are differentially co-regulated with the predominant site, as well as its associated upstream kinases and interacting proteins.
Results: Among the 173 reported phosphorylation sites representing KMT2D, Serine 2274 (S2274) emerged as the predominant site being detected in over 42% of diverse mass spectrometry-based phosphoproteomics datasets. This site lies within one of KMT2D's unique "LSPPP" motifs, suggesting a potential regulatory role. Detailed investigation on the differentially co-regulated protein phosphosites revealed the phosphorylation of KMT2D at S2274 is consistently and positively co-regulated with MAPK1/ERK2 activation, as well as with the proteins involved in the MAPK cascade, epigenetic regulation and cell differentiation. Notably, ERK2 was predicted as an upstream kinase targeting S2274, suggesting that KMT2D S2274 functions as a potential downstream effector of MEK-ERK signaling pathway, potentially linking to epigenetic regulation and cell differentiation. Further, our results highlighted a potential mechanistic link between disrupted phosphorylation at S2274 and the pathogenesis of Kabuki syndrome.
Discussion: This study delineates the phosphoregulatory network of KMT2D, positioning it as a dynamic epigenetic effector modulated by MEK-ERK signaling, with broader implications for cancer and developmental disorders.
{"title":"Role of histone-lysine N-methyltransferase 2D (KMT2D) in MEK-ERK signaling-mediated epigenetic regulation: a phosphoproteomics perspective.","authors":"Sreeshma Ravindran Kammarambath, Leona Dcunha, Athira Perunelly Gopalakrishnan, Amal Fahma, Neelam Krishna, Altaf Mahin, Samseera Ummar, Prathik Basthikoppa Shivamurthy, Inamul Hasan Madar, Rajesh Raju","doi":"10.3389/fbinf.2025.1683469","DOIUrl":"10.3389/fbinf.2025.1683469","url":null,"abstract":"<p><strong>Introduction: </strong>Histone-lysine N-methyltransferase 2D (KMT2D) is an H3K4 methyltransferase and a potential tumor suppressor with a crucial role in regulating gene expression. Its dysregulation has been implicated in developmental disorders and several types of cancers. Despite this, the molecular mechanisms that govern its activity remain largely elusive. Among these, post-translational modifications, especially phosphorylation, serve as an essential regulator, fine-tuning KMT2D stability, localization and functional interactions for maintaining cellular homeostasis. With over 173 phosphorylation sites reported, KMT2D is significantly regulated by kinases and exploring its phospho-regulatory network based on targeted <i>in vitro</i> approaches is challenging.</p><p><strong>Methods: </strong>We systematically curated and integrated the global phosphoproteomic datasets, along with their corresponding experimental conditions, to comprehensively identify the phosphorylation events reported for KMT2D. The site exhibiting the highest frequency of detection across these datasets is considered the predominant phosphorylation site. To investigate its functional significance, we analyzed the proteins and their phosphorylation sites that are differentially co-regulated with the predominant site, as well as its associated upstream kinases and interacting proteins.</p><p><strong>Results: </strong>Among the 173 reported phosphorylation sites representing KMT2D, Serine 2274 (S2274) emerged as the predominant site being detected in over 42% of diverse mass spectrometry-based phosphoproteomics datasets. This site lies within one of KMT2D's unique \"<i>LSPPP</i>\" motifs, suggesting a potential regulatory role. Detailed investigation on the differentially co-regulated protein phosphosites revealed the phosphorylation of KMT2D at S2274 is consistently and positively co-regulated with MAPK1/ERK2 activation, as well as with the proteins involved in the MAPK cascade, epigenetic regulation and cell differentiation. Notably, ERK2 was predicted as an upstream kinase targeting S2274, suggesting that KMT2D S2274 functions as a potential downstream effector of MEK-ERK signaling pathway, potentially linking to epigenetic regulation and cell differentiation. Further, our results highlighted a potential mechanistic link between disrupted phosphorylation at S2274 and the pathogenesis of Kabuki syndrome.</p><p><strong>Discussion: </strong>This study delineates the phosphoregulatory network of KMT2D, positioning it as a dynamic epigenetic effector modulated by MEK-ERK signaling, with broader implications for cancer and developmental disorders.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1683469"},"PeriodicalIF":3.9,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12669113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1687617
Anthony Wong, Sanskruthi Guduri, TsungYen Chen, Kunal Patel
Introduction: Multi-target peptide therapeutics targeting glucagon receptor (GCGR), glucagon-like peptide-1 receptor (GLP1R), and glucose-dependent insulinotropic polypeptide receptor (GIPR) represent a promising approach for treating diabetes and obesity. Triple agonist peptides demonstrate promising therapeutic potential compared to single-target approaches, yet rational design remains computationally challenging due to complex sequence-structure activity relationships. Existing methods, primarily based on convolutional neural networks, impose limitations including fixed sequence lengths and inadequate representation of molecular topology. Graph Attention Networks (GAT) offer advantages in capturing molecular structures and variable-length peptide sequences while providing interpretable insights into receptor-specific binding determinants.
Methods: A dataset of 234 peptide sequences with experimentally determined binding affinities was compiled from multiple sources. Peptides were represented as molecular graphs with seven-dimensional node features encoding physicochemical properties and positional information. The GAT architecture employed a shared encoder with task-specific prediction heads, implementing transfer learning to address limited GIPR training data. Performance was evaluated using 5-fold cross-validation and independent validation on 24 literature-derived sequences. A genetic algorithm framework was developed for peptide sequence optimization, incorporating multi objective fitness evaluation based on predicted binding affinity, biological plausibility, and sequence novelty.
Results: Cross-validation demonstrated robust GAT performance across all receptors, with GCGR achieving high accuracy (AUC ROC: 0.915 ± 0.050), followed by GLP1R (AUC-ROC: 0.853 ± 0.059), and GIPR showing acceptable performance despite limited data (AUC-ROC: 0.907 ± 0.083). Comparative analysis revealed receptor-specific advantages: GAT significantly outperformed CNN for GCGR prediction (RMSE: 0.942 vs. 1.209, p = 0.0013), while CNN maintained superior GLP1R performance (RMSE: 0.552 vs. 0.723). Genetic algorithm optimization measurable improvement over baseline, with 4.0% fitness Enhancement and generation of 20 candidates exhibiting mean binding probabilities exceeding 0.5 across all targets. The GAT-based framework provides a computational approach in computational peptide design, demonstrating receptor-specific advantages and robust optimization capabilities.
Conclusion: Genetic algorithm optimization enables systematic exploration of sequence space within existing agonist scaffolds while maintaining biological constraints. This approach provides a rational framework for prioritizing experimental validation efforts in triple agonist development.
介绍:针对胰高血糖素受体(GCGR)、胰高血糖素样肽-1受体(GLP1R)和葡萄糖依赖性胰岛素性多肽受体(GIPR)的多靶点肽治疗是治疗糖尿病和肥胖的一种很有前景的方法。与单靶点方法相比,三重激动剂肽显示出有希望的治疗潜力,但由于复杂的序列-结构-活性关系,合理的设计在计算上仍然具有挑战性。现有的方法,主要基于卷积神经网络,施加限制,包括固定的序列长度和分子拓扑的不充分表示。图注意网络(GAT)在捕获分子结构和变长肽序列方面具有优势,同时为受体特异性结合决定因素提供了可解释的见解。方法:从多个来源收集经实验确定结合亲和力的234条肽序列。多肽被表示为具有7维节点特征的分子图,这些节点特征编码了多肽的物理化学性质和位置信息。GAT架构采用具有特定任务预测头的共享编码器,实现迁移学习以解决有限的GIPR训练数据。使用5倍交叉验证和对24个文献衍生序列的独立验证来评估性能。基于预测结合亲和度、生物合理性和序列新颖性的多目标适应度评估,构建了多肽序列优化的遗传算法框架。结果:交叉验证表明,GAT在所有受体上都表现良好,GCGR的准确度较高(AUC ROC: 0.915±0.050),GLP1R的AUC ROC: 0.853±0.059),GIPR的AUC ROC: 0.907±0.083),尽管数据有限,但仍表现良好。对比分析显示了受体特异性优势:GAT在GCGR预测方面明显优于CNN (RMSE: 0.942 vs. 1.209, p = 0.0013),而CNN在GLP1R预测方面保持了优势(RMSE: 0.552 vs. 0.723)。遗传算法优化了可测量的基线改进,适应度增强4.0%,生成的20个候选对象在所有目标上的平均绑定概率超过0.5。基于gat的框架为计算肽设计提供了一种计算方法,展示了受体特异性优势和强大的优化能力。结论:遗传算法优化可以在保持生物约束的情况下,系统地探索现有激动剂支架内的序列空间。这种方法为在三联激动剂开发中优先考虑实验验证工作提供了合理的框架。
{"title":"Machine learning-guided optimization of triple agonist peptide therapeutics for metabolic disease.","authors":"Anthony Wong, Sanskruthi Guduri, TsungYen Chen, Kunal Patel","doi":"10.3389/fbinf.2025.1687617","DOIUrl":"10.3389/fbinf.2025.1687617","url":null,"abstract":"<p><strong>Introduction: </strong>Multi-target peptide therapeutics targeting glucagon receptor (GCGR), glucagon-like peptide-1 receptor (GLP1R), and glucose-dependent insulinotropic polypeptide receptor (GIPR) represent a promising approach for treating diabetes and obesity. Triple agonist peptides demonstrate promising therapeutic potential compared to single-target approaches, yet rational design remains computationally challenging due to complex sequence-structure activity relationships. Existing methods, primarily based on convolutional neural networks, impose limitations including fixed sequence lengths and inadequate representation of molecular topology. Graph Attention Networks (GAT) offer advantages in capturing molecular structures and variable-length peptide sequences while providing interpretable insights into receptor-specific binding determinants.</p><p><strong>Methods: </strong>A dataset of 234 peptide sequences with experimentally determined binding affinities was compiled from multiple sources. Peptides were represented as molecular graphs with seven-dimensional node features encoding physicochemical properties and positional information. The GAT architecture employed a shared encoder with task-specific prediction heads, implementing transfer learning to address limited GIPR training data. Performance was evaluated using 5-fold cross-validation and independent validation on 24 literature-derived sequences. A genetic algorithm framework was developed for peptide sequence optimization, incorporating multi objective fitness evaluation based on predicted binding affinity, biological plausibility, and sequence novelty.</p><p><strong>Results: </strong>Cross-validation demonstrated robust GAT performance across all receptors, with GCGR achieving high accuracy (AUC ROC: 0.915 ± 0.050), followed by GLP1R (AUC-ROC: 0.853 ± 0.059), and GIPR showing acceptable performance despite limited data (AUC-ROC: 0.907 ± 0.083). Comparative analysis revealed receptor-specific advantages: GAT significantly outperformed CNN for GCGR prediction (RMSE: 0.942 vs. 1.209, p = 0.0013), while CNN maintained superior GLP1R performance (RMSE: 0.552 vs. 0.723). Genetic algorithm optimization measurable improvement over baseline, with 4.0% fitness Enhancement and generation of 20 candidates exhibiting mean binding probabilities exceeding 0.5 across all targets. The GAT-based framework provides a computational approach in computational peptide design, demonstrating receptor-specific advantages and robust optimization capabilities.</p><p><strong>Conclusion: </strong>Genetic algorithm optimization enables systematic exploration of sequence space within existing agonist scaffolds while maintaining biological constraints. This approach provides a rational framework for prioritizing experimental validation efforts in triple agonist development.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1687617"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665757/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145662025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1663846
Osasan Stephen Adebayo, George Oche Ambrose, Daramola Olusola, Adefolalu Oluwafemi, Hind A Alzahrani, Abdulkarim Hasan
Introduction: KRAS mutations are key oncogenic drivers in lung cancer, yet effective pharmacological targeting has remained a major challenge due to the protein's elusive and dynamic binding pockets. Computational modeling offers a promising route to identify novel inhibitors with improved potency and selectivity.
Methods: A quantitative structure-activity relationship (QSAR) modeling approach was developed to predict the inhibitory potency (pIC50) of KRAS inhibitors and support de novo drug design. Molecular descriptors for 62 inhibitors retrieved from the ChEMBL database (CHEMBL4354832) were computed using Chemopy. Following descriptor normalization and dimensionality reduction, five machine learning algorithm spartial least squares (PLS), random forest (RF), stepwise multiple linear regression (MLR), genetic algorithm optimized MLR (GA-MLR), and XGBoost were applied. Model performance was evaluated using R2, RMSE, and MAE, while permutation-based importance and SHAP analyses provided feature interpretability.
Results: Among the models tested, PLS exhibited the best predictive performance (R2 = 0.851; RMSE = 0.292), followed by RF (R2 = 0.796). The GA-MLR model, based on eight optimized molecular descriptors, achieved good interpretability and robust internal validation (R2 = 0.677). Virtual screening of 56 de novo designed compounds within the model's applicability domain identified compound C9 with a predicted pIC50) of 8.11 as the most promising hit.
Discussion: This integrative QSAR modeling and de novo design framework effectively predicted the bioactivity of KRAS inhibitors and facilitated the identification of novel candidate molecules. The findings demonstrate the utility of combining interpretable machine learning models with virtual screening to accelerate the discovery of potent KRAS inhibitors for lung cancer therapy.
{"title":"QSAR-guided discovery of novel KRAS inhibitors for lung cancer therapy.","authors":"Osasan Stephen Adebayo, George Oche Ambrose, Daramola Olusola, Adefolalu Oluwafemi, Hind A Alzahrani, Abdulkarim Hasan","doi":"10.3389/fbinf.2025.1663846","DOIUrl":"10.3389/fbinf.2025.1663846","url":null,"abstract":"<p><strong>Introduction: </strong>KRAS mutations are key oncogenic drivers in lung cancer, yet effective pharmacological targeting has remained a major challenge due to the protein's elusive and dynamic binding pockets. Computational modeling offers a promising route to identify novel inhibitors with improved potency and selectivity.</p><p><strong>Methods: </strong>A quantitative structure-activity relationship (QSAR) modeling approach was developed to predict the inhibitory potency (pIC<sub>50</sub>) of KRAS inhibitors and support <i>de novo</i> drug design. Molecular descriptors for 62 inhibitors retrieved from the ChEMBL database (CHEMBL4354832) were computed using Chemopy. Following descriptor normalization and dimensionality reduction, five machine learning algorithm spartial least squares (PLS), random forest (RF), stepwise multiple linear regression (MLR), genetic algorithm optimized MLR (GA-MLR), and XGBoost were applied. Model performance was evaluated using <i>R</i> <sup>2</sup>, RMSE, and MAE, while permutation-based importance and SHAP analyses provided feature interpretability.</p><p><strong>Results: </strong>Among the models tested, PLS exhibited the best predictive performance (<i>R</i> <sup>2</sup> = 0.851; RMSE = 0.292), followed by RF (<i>R</i> <sup>2</sup> = 0.796). The GA-MLR model, based on eight optimized molecular descriptors, achieved good interpretability and robust internal validation (<i>R</i> <sup>2</sup> = 0.677). Virtual screening of 56 <i>de novo</i> designed compounds within the model's applicability domain identified compound C9 with a predicted pIC<sub>50</sub>) of 8.11 as the most promising hit.</p><p><strong>Discussion: </strong>This integrative QSAR modeling and <i>de novo</i> design framework effectively predicted the bioactivity of KRAS inhibitors and facilitated the identification of novel candidate molecules. The findings demonstrate the utility of combining interpretable machine learning models with virtual screening to accelerate the discovery of potent KRAS inhibitors for lung cancer therapy.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1663846"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145662629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-13eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1636240
Aanya Gupta, Koji Abe, Holden T Maecker
Introduction: FluPRINT is a multi-omics dataset that measures donors' protein expression and cell counts across various assays. Donors were also assigned a binary value (0 or 1), being labeled as high responders if they had a fold change ≥4 of the antibody titer for hemagglutination inhibition (HAI) from day 0 to day 28, and low responders otherwise (0). In this project, we used the MOFA and Stabl algorithms to analyze FluPRINT, estimate the population structure from the data, and identify the most important features for predicting response to the vaccine.
Methods: The preprocessing of the dataset included removing repeat features, scaling by assay, and removing outliers. Since Stabl does not directly address missing values, features with high amounts of missing values were removed and the remaining were ignored.
Results: MOFA identified the top feature in structure extraction as IL neg 2 CD4 pos CD45Ra neg pSTAT5. MOFA explains well the variance of the data while also choosing features that have good significance, as illustrated by their significant p-values (p < 0.05). Stabl found the top feature for explaining the outcome to be CD33- CD3+ CD4+ CD25hiCD127low CD161+ CD45RA + Tregs, which matched the top result of previously published analysis. MOFA's features achieved an AUROC of 0.616 (95% CI of 0.426-0.806), and Stabl's achieved an AUROC of 0.634 (95% CI of 0.432-0.823).
Discussion: Our research addresses a key knowledge gap: understanding how these fundamentally different analytical approaches perform when analyzing the same complex dataset. Our exploration evaluates their respective strengths, limitations, and biological insights and provides guidance on using MOFA and Stabl to find the best predictive cell subsets and features for understanding large immunological multi-omics data. The code for this project can be found at https://github.com/aanya21gupta/fluprint.
{"title":"Comprehensive analysis of multi-omics vaccine response data using MOFA and Stabl algorithms.","authors":"Aanya Gupta, Koji Abe, Holden T Maecker","doi":"10.3389/fbinf.2025.1636240","DOIUrl":"10.3389/fbinf.2025.1636240","url":null,"abstract":"<p><strong>Introduction: </strong>FluPRINT is a multi-omics dataset that measures donors' protein expression and cell counts across various assays. Donors were also assigned a binary value (0 or 1), being labeled as high responders if they had a fold change ≥4 of the antibody titer for hemagglutination inhibition (HAI) from day 0 to day 28, and low responders otherwise (0). In this project, we used the MOFA and Stabl algorithms to analyze FluPRINT, estimate the population structure from the data, and identify the most important features for predicting response to the vaccine.</p><p><strong>Methods: </strong>The preprocessing of the dataset included removing repeat features, scaling by assay, and removing outliers. Since Stabl does not directly address missing values, features with high amounts of missing values were removed and the remaining were ignored.</p><p><strong>Results: </strong>MOFA identified the top feature in structure extraction as IL neg 2 CD4 pos CD45Ra neg pSTAT5. MOFA explains well the variance of the data while also choosing features that have good significance, as illustrated by their significant p-values (p < 0.05). Stabl found the top feature for explaining the outcome to be CD33<sup>-</sup> CD3<sup>+</sup> CD4<sup>+</sup> CD25hiCD127low CD161+ CD45RA + Tregs, which matched the top result of previously published analysis. MOFA's features achieved an AUROC of 0.616 (95% CI of 0.426-0.806), and Stabl's achieved an AUROC of 0.634 (95% CI of 0.432-0.823).</p><p><strong>Discussion: </strong>Our research addresses a key knowledge gap: understanding how these fundamentally different analytical approaches perform when analyzing the same complex dataset. Our exploration evaluates their respective strengths, limitations, and biological insights and provides guidance on using MOFA and Stabl to find the best predictive cell subsets and features for understanding large immunological multi-omics data. The code for this project can be found at https://github.com/aanya21gupta/fluprint.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1636240"},"PeriodicalIF":3.9,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12657425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145649743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate prediction of antibody paratopes is a critical challenge in structure-limited, high-throughput discovery workflows. We present ParaDeep, a lightweight and interpretable deep learning framework for residue-level paratope prediction directly from amino acid sequences. ParaDeep integrates bidirectional long short-term memory networks with one-dimensional convolutional layers to capture both long-range sequence context and local binding motifs. We systematically evaluated 30 model configurations varying in encoding schemes, convolutional kernel sizes, and antibody chain types. In five-fold cross-validation, heavy (H) chain models achieved the highest performance (F1 = 0.856 ± 0.014, MCC = 0.842 ± 0.015), outperforming light (L) chain models (F1 = 0.774 ± 0.023, MCC = 0.772 ± 0.022). On an independent blind test set, ParaDeep attained F1 = 0.723 and MCC = 0.685 for H chains, and F1 = 0.607 and MCC = 0.587 for L chains, representing a 27% MCC improvement over the sequence-based baseline Parapred. Chain-specific modeling revealed that heavy chains provide stronger sequence-based predictive signals, while light chains benefit more from structural context. ParaDeep approaches the performance of state-of-the-art structure-based methods on heavy chains while requiring only sequence input, enabling faster and broader applicability without the computational cost of 3D modeling. Its efficiency and scalability make it well-suited for early-stage antibody discovery, repertoire profiling, and therapeutic design, particularly in the absence of structural data. The implementation is freely available at https://github.com/PiyachatU/ParaDeep, with Python (PyTorch) code and a Google Colab interface for ease of use.
{"title":"ParaDeep: sequence-based deep learning for residue-level paratope prediction using chain-aware BiLSTM-CNN models.","authors":"Piyachat Udomwong, Thanathat Pamonsupornwichit, Kanchanok Kodchakorn, Chatchai Tayapiwatana","doi":"10.3389/fbinf.2025.1684042","DOIUrl":"10.3389/fbinf.2025.1684042","url":null,"abstract":"<p><p>Accurate prediction of antibody paratopes is a critical challenge in structure-limited, high-throughput discovery workflows. We present ParaDeep, a lightweight and interpretable deep learning framework for residue-level paratope prediction directly from amino acid sequences. ParaDeep integrates bidirectional long short-term memory networks with one-dimensional convolutional layers to capture both long-range sequence context and local binding motifs. We systematically evaluated 30 model configurations varying in encoding schemes, convolutional kernel sizes, and antibody chain types. In five-fold cross-validation, heavy (H) chain models achieved the highest performance (F1 = 0.856 ± 0.014, MCC = 0.842 ± 0.015), outperforming light (L) chain models (F1 = 0.774 ± 0.023, MCC = 0.772 ± 0.022). On an independent blind test set, ParaDeep attained F1 = 0.723 and MCC = 0.685 for H chains, and F1 = 0.607 and MCC = 0.587 for L chains, representing a 27% MCC improvement over the sequence-based baseline Parapred. Chain-specific modeling revealed that heavy chains provide stronger sequence-based predictive signals, while light chains benefit more from structural context. ParaDeep approaches the performance of state-of-the-art structure-based methods on heavy chains while requiring only sequence input, enabling faster and broader applicability without the computational cost of 3D modeling. Its efficiency and scalability make it well-suited for early-stage antibody discovery, repertoire profiling, and therapeutic design, particularly in the absence of structural data. The implementation is freely available at https://github.com/PiyachatU/ParaDeep, with Python (PyTorch) code and a Google Colab interface for ease of use.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1684042"},"PeriodicalIF":3.9,"publicationDate":"2025-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12626946/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-04eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1681811
Kylee K Rahm, Branden S Kinghorn, Myanna J Moody, Ben C Stone, Kenton C Strong, Brian S Kim, Yen Jou Chang, Samantha N Sleight, Alyssa A Nitz, David V Hansen, Matthew H Bailey
Introduction: Recent advances in Alzheimer's research suggest that the brain's immune system plays a critical role in the development and progression of this devastating disease. Microglial cells are vital as immune cells in the brain's defense system. Human Microglia Clone 3 (HMC3) is a cell line developed as a promising experimental model to understand the role of microglial cells in human diseases including Alzheimer's and other neurodegenerative diseases. The frequency of HMC3 cell usage has increased in recent years, with the idea that this cell line could serve as a convenient model for human microglial cell functions.
Methods: We utilized gene-pair ratios from bulk and single-cell RNA sequencing (scRNA-seq) expression data to create predictive models of cell-type origins.
Results: Our model reveals that the HMC3 cell line represents various cell types, with the highest cell similarity score relating to astrocytes, not microglia.
Discussion: These findings suggest that the HMC3 cell line is not a reliable human microglia model and that extreme caution should be taken when interpreting the results of studies using the HMC3 cell line.
{"title":"Cellf-deception: human microglia clone 3 (HMC3) cells exhibit more astrocyte-like than microglia-like gene expression.","authors":"Kylee K Rahm, Branden S Kinghorn, Myanna J Moody, Ben C Stone, Kenton C Strong, Brian S Kim, Yen Jou Chang, Samantha N Sleight, Alyssa A Nitz, David V Hansen, Matthew H Bailey","doi":"10.3389/fbinf.2025.1681811","DOIUrl":"10.3389/fbinf.2025.1681811","url":null,"abstract":"<p><strong>Introduction: </strong>Recent advances in Alzheimer's research suggest that the brain's immune system plays a critical role in the development and progression of this devastating disease. Microglial cells are vital as immune cells in the brain's defense system. Human Microglia Clone 3 (HMC3) is a cell line developed as a promising experimental model to understand the role of microglial cells in human diseases including Alzheimer's and other neurodegenerative diseases. The frequency of HMC3 cell usage has increased in recent years, with the idea that this cell line could serve as a convenient model for human microglial cell functions.</p><p><strong>Methods: </strong>We utilized gene-pair ratios from bulk and single-cell RNA sequencing (scRNA-seq) expression data to create predictive models of cell-type origins.</p><p><strong>Results: </strong>Our model reveals that the HMC3 cell line represents various cell types, with the highest cell similarity score relating to astrocytes, not microglia.</p><p><strong>Discussion: </strong>These findings suggest that the HMC3 cell line is not a reliable human microglia model and that extreme caution should be taken when interpreting the results of studies using the HMC3 cell line.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1681811"},"PeriodicalIF":3.9,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12623408/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-03eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1665892
Ana Stolnicu, Peter Eckhardt-Bellmann, Angelika M R Kestler, Hans A Kestler
Introduction: Numerous biological systems exhibit ordinal connections between categories. Developmental and time-series information inherently depict sequences like "early," "intermediate," and "late" phases, showing that these specific processes follow a progression. Ordinal classification techniques are often applied in biological and medical contexts, ranging from the evaluation of pain intensity, to the detection of evolving diseases, such as cancer. These ranking systems may assist clinicians in establishing diagnoses and developing tailored treatment plans. For instance, tumor staging might guide early detection strategies and targeted therapies, improving patient outcomes. However, applying ordinal classification to biological data presents considerable challenges. In addition to their high dimensionality, these datasets can be highly heterogeneous, often reflecting branching processes that occur simultaneously during progression. Factors such as intratumoral diversity, asynchronous progress, and context-specific signaling activity may interfere with the identification of such alternative development routes.
Methods: To address these challenges, we propose a framework for uncovering ordinal relationships within molecular data. Specifically, directed threshold classifiers are introduced as base learners for ordinal classifier cascades, enabling the detection of both total and partial orderings between molecular states.
Results: This approach preserves the inherent ordinal structure by projecting high-dimensional data onto one single dimension while simultaneously decreasing complexity. Additionally, the distinct features of the resulting thresholds allow the prediction of potential alternative paths among the suborders.
{"title":"Identification of ordinal relations and alternative suborders within high-dimensional molecular data.","authors":"Ana Stolnicu, Peter Eckhardt-Bellmann, Angelika M R Kestler, Hans A Kestler","doi":"10.3389/fbinf.2025.1665892","DOIUrl":"10.3389/fbinf.2025.1665892","url":null,"abstract":"<p><strong>Introduction: </strong>Numerous biological systems exhibit ordinal connections between categories. Developmental and time-series information inherently depict sequences like \"early,\" \"intermediate,\" and \"late\" phases, showing that these specific processes follow a progression. Ordinal classification techniques are often applied in biological and medical contexts, ranging from the evaluation of pain intensity, to the detection of evolving diseases, such as cancer. These ranking systems may assist clinicians in establishing diagnoses and developing tailored treatment plans. For instance, tumor staging might guide early detection strategies and targeted therapies, improving patient outcomes. However, applying ordinal classification to biological data presents considerable challenges. In addition to their high dimensionality, these datasets can be highly heterogeneous, often reflecting branching processes that occur simultaneously during progression. Factors such as intratumoral diversity, asynchronous progress, and context-specific signaling activity may interfere with the identification of such alternative development routes.</p><p><strong>Methods: </strong>To address these challenges, we propose a framework for uncovering ordinal relationships within molecular data. Specifically, directed threshold classifiers are introduced as base learners for ordinal classifier cascades, enabling the detection of both total and partial orderings between molecular states.</p><p><strong>Results: </strong>This approach preserves the inherent ordinal structure by projecting high-dimensional data onto one single dimension while simultaneously decreasing complexity. Additionally, the distinct features of the resulting thresholds allow the prediction of potential alternative paths among the suborders.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1665892"},"PeriodicalIF":3.9,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12620363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145552026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-31eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1705252
Yaan J Jang
{"title":"Editorial: Computational protein function prediction based on sequence and/or structural data.","authors":"Yaan J Jang","doi":"10.3389/fbinf.2025.1705252","DOIUrl":"10.3389/fbinf.2025.1705252","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1705252"},"PeriodicalIF":3.9,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12615499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145544048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}