Pub Date : 2026-01-19eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1676149
Yan Li, Boran Wang, Zhen Liu, Wei Wei, Caiyi Fei, Shi Xu, Tiyun Han, Wei Geng, Zengding Wu
Introduction: Translation initiation and termination are critical regulatory checkpoints in protein synthesis, yet accurate computational prediction of their sites remains challenging due to training data biases and the complexity of full-length transcripts.
Methods: To address these limitations, we present TRANSAID (TRANSlation AI for Detection), a novel deep learning framework that accurately and simultaneously predicts translation initiation (TIS) and termination (TTS) sites from complete transcript sequences. TRANSAID's hierarchical architecture efficiently processes long transcripts, capturing both local motifs and long-range dependencies. Crucially, the model was trained on a human transcriptome dataset that was rigorously partitioned at the gene level to prevent data leakage and included both protein-coding (NM) and non-coding (NR) transcripts.
Results: This mixed-training strategy enables TRANSAID to achieve high fidelity, correctly identifying 73.61% of NR transcripts as non-coding. Performance is further enhanced by an integrated biological scoring system, improving "perfect ORF prediction" for coding sequences to 94.94% and "correct non-coding prediction" to 82.00%. The human-trained model demonstrates remarkable cross-species applicability, maintaining high accuracy on organisms from mammals to yeast. Beyond annotation, TRANSAID serves as a powerful discovery tool for novel coding events. When applied to long-read sequencing data, it accurately identified previously unannotated protein isoforms validated by mass spectrometry (76.28% validation rate). Furthermore, homology searches of high-scoring ORFs predicted within NR transcripts suggest a strong potential for identifying cryptic translation events.
Discussion: As a fully documented open-source tool with a user-friendly web server, TRANSAID provides a powerful and accessible resource for improving transcriptome annotation and proteomic discovery.
翻译起始和终止是蛋白质合成中关键的调控检查点,但由于训练数据的偏差和全长转录本的复杂性,对其位置的准确计算预测仍然具有挑战性。方法:为了解决这些限制,我们提出了TRANSAID (TRANSlation AI for Detection),这是一个新的深度学习框架,可以准确地同时预测完整转录序列中的翻译起始(TIS)和终止(TTS)位点。TRANSAID的分层结构有效地处理长转录本,捕捉本地主题和长期依赖关系。至关重要的是,该模型是在人类转录组数据集上进行训练的,该数据集在基因水平上进行了严格划分,以防止数据泄露,并包括蛋白质编码(NM)和非编码(NR)转录本。结果:这种混合训练策略使TRANSAID达到高保真度,正确识别73.61%的NR转录本为非编码。集成的生物评分系统进一步提高了性能,将编码序列的“完美ORF预测”提高到94.94%,“正确非编码预测”提高到82.00%。人类训练的模型显示出显著的跨物种适用性,在从哺乳动物到酵母的生物体上保持了很高的准确性。除了注释之外,TRANSAID还是一个强大的发现新编码事件的工具。当应用于长读测序数据时,它准确地鉴定了先前未经质谱验证的未注释的蛋白质异构体(验证率为76.28%)。此外,对NR转录本中预测的高分orf的同源性搜索表明,识别隐翻译事件具有很大的潜力。讨论:TRANSAID是一个有完整文档的开源工具,带有用户友好的web服务器,它为改进转录组注释和蛋白质组学发现提供了强大的可访问资源。
{"title":"TRANSAID: a hybrid deep learning framework for translation site prediction with integrated biological feature scoring.","authors":"Yan Li, Boran Wang, Zhen Liu, Wei Wei, Caiyi Fei, Shi Xu, Tiyun Han, Wei Geng, Zengding Wu","doi":"10.3389/fbinf.2025.1676149","DOIUrl":"10.3389/fbinf.2025.1676149","url":null,"abstract":"<p><strong>Introduction: </strong>Translation initiation and termination are critical regulatory checkpoints in protein synthesis, yet accurate computational prediction of their sites remains challenging due to training data biases and the complexity of full-length transcripts.</p><p><strong>Methods: </strong>To address these limitations, we present TRANSAID (TRANSlation AI for Detection), a novel deep learning framework that accurately and simultaneously predicts translation initiation (TIS) and termination (TTS) sites from complete transcript sequences. TRANSAID's hierarchical architecture efficiently processes long transcripts, capturing both local motifs and long-range dependencies. Crucially, the model was trained on a human transcriptome dataset that was rigorously partitioned at the gene level to prevent data leakage and included both protein-coding (NM) and non-coding (NR) transcripts.</p><p><strong>Results: </strong>This mixed-training strategy enables TRANSAID to achieve high fidelity, correctly identifying 73.61% of NR transcripts as non-coding. Performance is further enhanced by an integrated biological scoring system, improving \"perfect ORF prediction\" for coding sequences to 94.94% and \"correct non-coding prediction\" to 82.00%. The human-trained model demonstrates remarkable cross-species applicability, maintaining high accuracy on organisms from mammals to yeast. Beyond annotation, TRANSAID serves as a powerful discovery tool for novel coding events. When applied to long-read sequencing data, it accurately identified previously unannotated protein isoforms validated by mass spectrometry (76.28% validation rate). Furthermore, homology searches of high-scoring ORFs predicted within NR transcripts suggest a strong potential for identifying cryptic translation events.</p><p><strong>Discussion: </strong>As a fully documented open-source tool with a user-friendly web server, TRANSAID provides a powerful and accessible resource for improving transcriptome annotation and proteomic discovery.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1676149"},"PeriodicalIF":3.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12862215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1769816
Xiaoxi Hao, Dingjian Liang, Yimin Shen, Cuimin Sun, Wei Lan
Introduction: Bacterial pneumonia remains a major global health challenge, and early pathogen identification is important for timely and targeted treatment. However, conventional microbiological diagnostics such as sputum or blood culture are labor-intensive and time-consuming.
Methods: We propose an interpretable ensemble learning framework (PreBP) for rapid pathogen identification using routinely available complete blood count (CBC) parameters. We analyzed 1,334 CBC samples from patients with culture-confirmed bacterial pneumonia caused by four major pathogens: Pseudomonas aeruginosa, Escherichia coli, Staphylococcus aureus, and Streptococcus pneumoniae. Pathogen labels were determined based on clinical culture results. Five machine learning models (extreme gradient boosting (XGBoost), multilayer perceptron neural network (MLPNN), adaptive boosting (AdaBoost), random forest (RF), and extremely randomized trees (ExtraTrees)) were trained as comparators, and PreBP was developed with metaheuristic-optimized hyperparameters. Key CBC biomarkers were refined using a dual-phase feature selection strategy combining Lasso and Boruta. To enhance transparency, SHapley additive explanations (SHAP) were applied to provide both global biomarker importance and local, case-level explanations.
Results: PreBP achieved the best overall performance, with an AUC of 0.920, precision of 87.1%, and accuracy and sensitivity of 86.7%.
Discussion: Because the framework relies on routine CBC measurements, it can generate interpretable predictions once CBC results are available, which may provide supplementary evidence for earlier pathogen-oriented clinical decision-making alongside culture-dependent workflows. Overall, PreBP offers an interpretable and computational approach for pathogen identification in bacterial pneumonia based on routine laboratory data.
{"title":"PreBP: an interpretable, optimized ensemble framework using routine complete blood count for rapid pathogen identification in bacterial pneumonia.","authors":"Xiaoxi Hao, Dingjian Liang, Yimin Shen, Cuimin Sun, Wei Lan","doi":"10.3389/fbinf.2025.1769816","DOIUrl":"10.3389/fbinf.2025.1769816","url":null,"abstract":"<p><strong>Introduction: </strong>Bacterial pneumonia remains a major global health challenge, and early pathogen identification is important for timely and targeted treatment. However, conventional microbiological diagnostics such as sputum or blood culture are labor-intensive and time-consuming.</p><p><strong>Methods: </strong>We propose an interpretable ensemble learning framework (PreBP) for rapid pathogen identification using routinely available complete blood count (CBC) parameters. We analyzed 1,334 CBC samples from patients with culture-confirmed bacterial pneumonia caused by four major pathogens: <i>Pseudomonas aeruginosa</i>, <i>Escherichia coli</i>, <i>Staphylococcus aureus</i>, and <i>Streptococcus</i> pneumoniae. Pathogen labels were determined based on clinical culture results. Five machine learning models (extreme gradient boosting (XGBoost), multilayer perceptron neural network (MLPNN), adaptive boosting (AdaBoost), random forest (RF), and extremely randomized trees (ExtraTrees)) were trained as comparators, and PreBP was developed with metaheuristic-optimized hyperparameters. Key CBC biomarkers were refined using a dual-phase feature selection strategy combining Lasso and Boruta. To enhance transparency, SHapley additive explanations (SHAP) were applied to provide both global biomarker importance and local, case-level explanations.</p><p><strong>Results: </strong>PreBP achieved the best overall performance, with an AUC of 0.920, precision of 87.1%, and accuracy and sensitivity of 86.7%.</p><p><strong>Discussion: </strong>Because the framework relies on routine CBC measurements, it can generate interpretable predictions once CBC results are available, which may provide supplementary evidence for earlier pathogen-oriented clinical decision-making alongside culture-dependent workflows. Overall, PreBP offers an interpretable and computational approach for pathogen identification in bacterial pneumonia based on routine laboratory data.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1769816"},"PeriodicalIF":3.9,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12847367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146087272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1745495
Nandha Kumar Subramani, Subhashree Venugopal, Anand Prem Rajan
Introduction: MRSA is a multi-drug-resistant bacteria responsible for severe infections that has become a major health concern. Due to constraints of traditional methods, there is a need for developing a new approach to prevent the MRSA-related infections by targeting key pathogens.
Methods: Initially, the subtractive genomics was applied to the MRSA proteome to identify non-homologous, essential, and virulence targets using comparative BLAST-based screening. Further, immunoinformatic tools were employed for B- and T-cell epitope prediction and vaccine construction with appropriate adjuvants and linkers, followed by immune simulation and molecular docking with immune receptors.
Results: Comparative metabolic pathway analysis identified 294 MRSA pathway proteins, with acetolactate synthase (ALS) as a non-homologous, essential, and virulent protein that is involved in the branched amino acid biosynthesis pathway. The constructed ALS vaccine consists of 3 B-cell and 19 T-cell epitopes exhibited stable immunological features with 97.55% global population coverage. Molecular docking revealed that ALS exhibited a superior binding affinity with the TLR4 receptor (-1,438.7 kcal/mol) than the TLR2 receptor (-1,103.5 kcal/mol), which was further confirmed by high structural stability and compactness analysis. Immune simulations also exhibited elevated IgM, IgG subtypes, and cytokine productions, suggesting a robust humoral and cellular immunity.
Discussion: Identified ALS highlights its biological relevance in MRSA survival. The stability predictions with TLR4 suggested effective activation of innate immunity that may enhance antigen presentation and downstream adaptive immunity. The validation of the ALS vaccine's safety and immunogenicity further requires comprehensive in vitro and in vivo examinations.
Conclusion: Thus, ALS is recognized as a promising MRSA vaccine candidate and has the potential to activate immune responses effectively.
{"title":"An integrated subtractive genomics and immunoinformatic approach for designing a multi-epitope peptide vaccine against methicillin-resistant <i>Staphylococcus aureus</i>.","authors":"Nandha Kumar Subramani, Subhashree Venugopal, Anand Prem Rajan","doi":"10.3389/fbinf.2025.1745495","DOIUrl":"10.3389/fbinf.2025.1745495","url":null,"abstract":"<p><strong>Introduction: </strong>MRSA is a multi-drug-resistant bacteria responsible for severe infections that has become a major health concern. Due to constraints of traditional methods, there is a need for developing a new approach to prevent the MRSA-related infections by targeting key pathogens.</p><p><strong>Methods: </strong>Initially, the subtractive genomics was applied to the MRSA proteome to identify non-homologous, essential, and virulence targets using comparative BLAST-based screening. Further, immunoinformatic tools were employed for B- and T-cell epitope prediction and vaccine construction with appropriate adjuvants and linkers, followed by immune simulation and molecular docking with immune receptors.</p><p><strong>Results: </strong>Comparative metabolic pathway analysis identified 294 MRSA pathway proteins, with acetolactate synthase (ALS) as a non-homologous, essential, and virulent protein that is involved in the branched amino acid biosynthesis pathway. The constructed ALS vaccine consists of 3 B-cell and 19 T-cell epitopes exhibited stable immunological features with 97.55% global population coverage. Molecular docking revealed that ALS exhibited a superior binding affinity with the TLR4 receptor (-1,438.7 kcal/mol) than the TLR2 receptor (-1,103.5 kcal/mol), which was further confirmed by high structural stability and compactness analysis. Immune simulations also exhibited elevated IgM, IgG subtypes, and cytokine productions, suggesting a robust humoral and cellular immunity.</p><p><strong>Discussion: </strong>Identified ALS highlights its biological relevance in MRSA survival. The stability predictions with TLR4 suggested effective activation of innate immunity that may enhance antigen presentation and downstream adaptive immunity. The validation of the ALS vaccine's safety and immunogenicity further requires comprehensive <i>in vitro</i> and <i>in vivo</i> examinations.</p><p><strong>Conclusion: </strong>Thus, ALS is recognized as a promising MRSA vaccine candidate and has the potential to activate immune responses effectively.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1745495"},"PeriodicalIF":3.9,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12847441/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1713975
Zeyu Zou, Ziheng Duan
Spatial transcriptomics (ST) technologies enable the profiling of gene expression while preserving spatial context, offering unprecedented insights into tissue organization. However, traditional spatial domain identification methods primarily rely on gene expression matrices and spatial coordinates while overlooking the rich biological knowledge encoded in gene functional descriptions. Here, we propose SpaLLM, a general framework that integrates large language model (LLM) embeddings of gene descriptions with conventional spatial transcriptomics analysis. Our approach leverages pre-computed GenePT embeddings from NCBI gene summaries to create biologically-informed gene representations. SpaLLM combines these LLM-derived gene features with cell-gene expression matrices through matrix multiplication, generating enriched cell representations that capture both expression patterns and functional knowledge. These enriched features are then integrated with existing graph-based spatial analysis methods for improved spatial domain identification. Extensive validation on 12 sequencing-based Visium sections and an independent imaging-based osmFISH dataset demonstrate that SpaLLM consistently enhances spatial domain identification. Our modular framework can be seamlessly integrated with existing spatial analysis pipelines, making it broadly applicable to diverse research scenarios.
{"title":"SpaLLM: a general framework for spatial domain identification with large language models.","authors":"Zeyu Zou, Ziheng Duan","doi":"10.3389/fbinf.2025.1713975","DOIUrl":"10.3389/fbinf.2025.1713975","url":null,"abstract":"<p><p>Spatial transcriptomics (ST) technologies enable the profiling of gene expression while preserving spatial context, offering unprecedented insights into tissue organization. However, traditional spatial domain identification methods primarily rely on gene expression matrices and spatial coordinates while overlooking the rich biological knowledge encoded in gene functional descriptions. Here, we propose SpaLLM, a general framework that integrates large language model (LLM) embeddings of gene descriptions with conventional spatial transcriptomics analysis. Our approach leverages pre-computed GenePT embeddings from NCBI gene summaries to create biologically-informed gene representations. SpaLLM combines these LLM-derived gene features with cell-gene expression matrices through matrix multiplication, generating enriched cell representations that capture both expression patterns and functional knowledge. These enriched features are then integrated with existing graph-based spatial analysis methods for improved spatial domain identification. Extensive validation on 12 sequencing-based Visium sections and an independent imaging-based osmFISH dataset demonstrate that SpaLLM consistently enhances spatial domain identification. Our modular framework can be seamlessly integrated with existing spatial analysis pipelines, making it broadly applicable to diverse research scenarios.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1713975"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12833451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1712577
Joshua Stephenson, Konda Reddy Karnati
Accurately predicting protein-ligand binding affinity is key in drug discovery. Machine Learning and Deep Learning methods used in the drug discovery process have advanced the prediction of drug-target binding affinities, particularly for G protein-coupled receptors (GPCRs), a pharmacologically significant yet structurally heterogeneous protein family. In this review, binding affinity prediction models are examined and organized according to sequence-based one-dimensional, graph-based two-dimensional, and structure-based three-dimensional frameworks. Sequence-based models utilize convolutional neural networks for high-throughput screening. Recently published models incorporated attention mechanisms and self-supervised learning, enhancing interpretability and reducing dependence on annotated datasets. Graph-based models employ graph neural networks and molecular contact maps to capture topological features, enabling substructure-sensitive predictions. Structure-based approaches integrate spatial and conformational data into high-resolution interaction models. The hybrid use of these three approaches could significantly increase the success rate of in silico models for drug discovery, particularly for GPCRs.
{"title":"Recent trends in machine learning and deep learning-based prediction of G-protein coupled receptor-ligand binding affinities.","authors":"Joshua Stephenson, Konda Reddy Karnati","doi":"10.3389/fbinf.2025.1712577","DOIUrl":"10.3389/fbinf.2025.1712577","url":null,"abstract":"<p><p>Accurately predicting protein-ligand binding affinity is key in drug discovery. Machine Learning and Deep Learning methods used in the drug discovery process have advanced the prediction of drug-target binding affinities, particularly for G protein-coupled receptors (GPCRs), a pharmacologically significant yet structurally heterogeneous protein family. In this review, binding affinity prediction models are examined and organized according to sequence-based one-dimensional, graph-based two-dimensional, and structure-based three-dimensional frameworks. Sequence-based models utilize convolutional neural networks for high-throughput screening. Recently published models incorporated attention mechanisms and self-supervised learning, enhancing interpretability and reducing dependence on annotated datasets. Graph-based models employ graph neural networks and molecular contact maps to capture topological features, enabling substructure-sensitive predictions. Structure-based approaches integrate spatial and conformational data into high-resolution interaction models. The hybrid use of these three approaches could significantly increase the success rate of <i>in silico</i> models for drug discovery, particularly for GPCRs.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1712577"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1740715
Yi Zhang, Yongqian Liu, Wei Meng, Xiaobo Yu, Xiaojun Xu
Background: Intracerebral hemorrhage (ICH) triggers secondary brain injury through neuroinflammation, yet the interplay between metabolic reprogramming and inflammatory responses remains poorly defined. This study investigated how glucose metabolism dysregulation contributes to neuroinflammatory pathogenesis following ICH.
Methods: We integrated transcriptomic datasets from bulk RNA sequencing (human perihematomal tissue), single-cell RNA sequencing (mouse ICH model), and spatial transcriptomics (mouse time-series). Bioinformatic analyses included differential expression screening, single-cell weighted gene co-expression network analysis, pseudotemporal trajectory reconstruction, and cell-cell communication inference to identify key metabolic-inflammation regulators and their spatiotemporal dynamics.
Results: Multi-omics convergence revealed hexokinase 2 (HK2), heat shock protein A5 (HSPA5), and tumor necrosis factor (TNF) as core regulators linking glucose metabolism to neuroinflammation. Single-cell analysis showed significant time-dependent regulation of HK2 in microglia, while spatial transcriptomics uncovered synchronized alterations of HK2, HSPA5, and TNF in perihematomal regions at day 7. Cell communication analysis highlighted enhanced microglia-to-neutrophil signaling via Tnf-Tnfrsf1b pairs, with TNF signaling identified as the most significantly upregulated pathway in ICH conditions.
Conclusion: Our multi-omics approach reveals coordinated dysregulation of glucose metabolism and inflammatory genes following ICH, with time-dependent HK2 regulation in microglia and synchronized transcriptional changes at day 7 representing critical events in neuroinflammatory progression. The identified gene networks and cellular communication patterns provide new insights into the metabolic-immune interface in ICH, offering potential targets for future therapeutic strategies.
{"title":"Integrative transcriptomic analysis reveals microglial metabolic-inflammatory crosstalk of HK2-HSPA5-TNF axis after intracerebral hemorrhage.","authors":"Yi Zhang, Yongqian Liu, Wei Meng, Xiaobo Yu, Xiaojun Xu","doi":"10.3389/fbinf.2025.1740715","DOIUrl":"10.3389/fbinf.2025.1740715","url":null,"abstract":"<p><strong>Background: </strong>Intracerebral hemorrhage (ICH) triggers secondary brain injury through neuroinflammation, yet the interplay between metabolic reprogramming and inflammatory responses remains poorly defined. This study investigated how glucose metabolism dysregulation contributes to neuroinflammatory pathogenesis following ICH.</p><p><strong>Methods: </strong>We integrated transcriptomic datasets from bulk RNA sequencing (human perihematomal tissue), single-cell RNA sequencing (mouse ICH model), and spatial transcriptomics (mouse time-series). Bioinformatic analyses included differential expression screening, single-cell weighted gene co-expression network analysis, pseudotemporal trajectory reconstruction, and cell-cell communication inference to identify key metabolic-inflammation regulators and their spatiotemporal dynamics.</p><p><strong>Results: </strong>Multi-omics convergence revealed hexokinase 2 (HK2), heat shock protein A5 (HSPA5), and tumor necrosis factor (TNF) as core regulators linking glucose metabolism to neuroinflammation. Single-cell analysis showed significant time-dependent regulation of HK2 in microglia, while spatial transcriptomics uncovered synchronized alterations of HK2, HSPA5, and TNF in perihematomal regions at day 7. Cell communication analysis highlighted enhanced microglia-to-neutrophil signaling via Tnf-Tnfrsf1b pairs, with TNF signaling identified as the most significantly upregulated pathway in ICH conditions.</p><p><strong>Conclusion: </strong>Our multi-omics approach reveals coordinated dysregulation of glucose metabolism and inflammatory genes following ICH, with time-dependent HK2 regulation in microglia and synchronized transcriptional changes at day 7 representing critical events in neuroinflammatory progression. The identified gene networks and cellular communication patterns provide new insights into the metabolic-immune interface in ICH, offering potential targets for future therapeutic strategies.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1740715"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12833071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1697212
Iveth Gómez-Morales, Adriana P Mendizabal-Ruiz, J Alejandro Morales, Teresa Romero-Gutiérrez
Introduction: The Axolotl (Ambystoma mexicanum) offers a deep insight into brain regeneration by fully reconstructing its telencephalon post-injury, a capability that most vertebrates do not have. This study aimed to identify hub genes (highest-weighted genes) underlying this process and to map their cell location by analyzing spatiotemporal transcriptomic data using high-dimensional weighted gene co-expression network analysis, integrating protein-protein interaction networks, and cross-validating findings through literature.
Results: We identified 180 hub genes across the regeneration timeline, including several with conserved orthologs previously reported in vertebrate regeneration models. Among these candidates, TRH (Thyrotropin-Releasing Hormone) displayed the most consistent spatiotemporal pattern, appearing repeatedly as a hub gene and localizing to MSN enriched regions at multiple stages. TRH is broadly characterized in vertebrates as a neuroendocrine peptide with roles in hormonal signaling, and MSNs are known to respond to a variety of hormonal and neuropeptidergic cues. In our dataset, this background provides additional perspective on the transcriptional configurations in which TRH appears. Other hub genes showed stage/cell specific patterns, together outlining a heterogeneous and dynamic landscape of transcriptional states detected during telencephalon regeneration.
Conclusion: This study provides a descriptive map of gene co-expression dynamics during axolotl telencephalon regeneration. By integrating hdWGCNA, spatial transcriptomics, and network-based context, we identify hub genes and transcriptional states associated with injury response, including a persistent TRH linked MSN state. These findings offer a foundation for future experimental studies aimed at elucidating the molecular basis of axolotl brain repair.
{"title":"High-dimensional co-expression network analysis reveals persistent TRH gene expression throughout axolotl telencephalon regeneration.","authors":"Iveth Gómez-Morales, Adriana P Mendizabal-Ruiz, J Alejandro Morales, Teresa Romero-Gutiérrez","doi":"10.3389/fbinf.2025.1697212","DOIUrl":"10.3389/fbinf.2025.1697212","url":null,"abstract":"<p><strong>Introduction: </strong>The Axolotl (<i>Ambystoma mexicanum</i>) offers a deep insight into brain regeneration by fully reconstructing its telencephalon post-injury, a capability that most vertebrates do not have. This study aimed to identify hub genes (highest-weighted genes) underlying this process and to map their cell location by analyzing spatiotemporal transcriptomic data using high-dimensional weighted gene co-expression network analysis, integrating protein-protein interaction networks, and cross-validating findings through literature.</p><p><strong>Results: </strong>We identified 180 hub genes across the regeneration timeline, including several with conserved orthologs previously reported in vertebrate regeneration models. Among these candidates, TRH (Thyrotropin-Releasing Hormone) displayed the most consistent spatiotemporal pattern, appearing repeatedly as a hub gene and localizing to MSN enriched regions at multiple stages. TRH is broadly characterized in vertebrates as a neuroendocrine peptide with roles in hormonal signaling, and MSNs are known to respond to a variety of hormonal and neuropeptidergic cues. In our dataset, this background provides additional perspective on the transcriptional configurations in which TRH appears. Other hub genes showed stage/cell specific patterns, together outlining a heterogeneous and dynamic landscape of transcriptional states detected during telencephalon regeneration.</p><p><strong>Conclusion: </strong>This study provides a descriptive map of gene co-expression dynamics during axolotl telencephalon regeneration. By integrating hdWGCNA, spatial transcriptomics, and network-based context, we identify hub genes and transcriptional states associated with injury response, including a persistent TRH linked MSN state. These findings offer a foundation for future experimental studies aimed at elucidating the molecular basis of axolotl brain repair.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1697212"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832632/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1703356
Srineevas Sriram, Chandresh Palanichamy, P T Subash, Manshi Kumari Gupta, C Sudandiradoss
Introduction: The Re1-Silencing Transcription Factor (REST) is a master regulator of gene silencing, orchestrating transcriptional repression by tethering chromatin-modifying co-repressors to the Re1 motif of target genes. While REST is recognized as a sentinel of cellular identity, its role in uterine fibroids (UF) remains unclear. This study aims to investigate how structural perturbations in REST may compromise its regulatory function and contribute to altered transcriptional control in fibroid biology.
Methods: A deep structural interrogation of REST was performed through expansive in silico analysis of 938 missense SNPs. Evolutionary conservation was assessed across ten primate species to identify structurally disruptive variants. Structural modelling, protein-protein and protein-DNA docking analyses were conducted to evaluate interactions with co-repressors and DNA. Molecular dynamics simulations were used to assess conformational stability, flexibility, compactness, and energetic changes in wild-type and mutant REST variants.
Results: Five structurally disruptive REST variants (Y31C, Y31D, L76Q, Y283C, L427Q) were identified at evolutionarily conserved residues. Structural modelling and docking analyses revealed weakened affinity for co-repressors, with the Y283C variant showing a marked reduction in SIN3A interaction (Z-score: 2.4 to -1.2) and impaired DNA binding (Z-score: 2.0 to -1.3). Molecular dynamics simulations demonstrated that Y283C increased rigidity (RMSF: 0.33 to 0.27 nm), reduced compactness (Rg: 3.48-3.51 nm), and lowered potential energy. Upon Re1 binding, destabilization intensified, with increased RMSD (0.95-1.07 nm) and pronounced shifts in energy.
Discussion: This integrative analysis highlights REST as a candidate regulatory component in uterine fibroid biology. Structural disruption of REST, particularly through the Y283C mutation, may destabilize molecular interactions and compromise DNA-binding precision, potentially unleashing transcriptional noise that fuels fibroid growth. These findings suggest that perturbation of REST-mediated transcriptional repression may be associated with altered regulatory control in this disease and could inform future strategies to investigate dysregulation in uterine fibroids.
{"title":"REST missense mutations reveal disrupted Re1 motif binding and co-repressor interactions in uterine fibroids.","authors":"Srineevas Sriram, Chandresh Palanichamy, P T Subash, Manshi Kumari Gupta, C Sudandiradoss","doi":"10.3389/fbinf.2025.1703356","DOIUrl":"10.3389/fbinf.2025.1703356","url":null,"abstract":"<p><strong>Introduction: </strong>The Re1-Silencing Transcription Factor (REST) is a master regulator of gene silencing, orchestrating transcriptional repression by tethering chromatin-modifying co-repressors to the Re1 motif of target genes. While REST is recognized as a sentinel of cellular identity, its role in uterine fibroids (UF) remains unclear. This study aims to investigate how structural perturbations in REST may compromise its regulatory function and contribute to altered transcriptional control in fibroid biology.</p><p><strong>Methods: </strong>A deep structural interrogation of REST was performed through expansive <i>in silico</i> analysis of 938 missense SNPs. Evolutionary conservation was assessed across ten primate species to identify structurally disruptive variants. Structural modelling, protein-protein and protein-DNA docking analyses were conducted to evaluate interactions with co-repressors and DNA. Molecular dynamics simulations were used to assess conformational stability, flexibility, compactness, and energetic changes in wild-type and mutant REST variants.</p><p><strong>Results: </strong>Five structurally disruptive REST variants (Y31C, Y31D, L76Q, Y283C, L427Q) were identified at evolutionarily conserved residues. Structural modelling and docking analyses revealed weakened affinity for co-repressors, with the Y283C variant showing a marked reduction in SIN3A interaction (Z-score: 2.4 to -1.2) and impaired DNA binding (Z-score: 2.0 to -1.3). Molecular dynamics simulations demonstrated that Y283C increased rigidity (RMSF: 0.33 to 0.27 nm), reduced compactness (Rg: 3.48-3.51 nm), and lowered potential energy. Upon Re1 binding, destabilization intensified, with increased RMSD (0.95-1.07 nm) and pronounced shifts in energy.</p><p><strong>Discussion: </strong>This integrative analysis highlights REST as a candidate regulatory component in uterine fibroid biology. Structural disruption of REST, particularly through the Y283C mutation, may destabilize molecular interactions and compromise DNA-binding precision, potentially unleashing transcriptional noise that fuels fibroid growth. These findings suggest that perturbation of REST-mediated transcriptional repression may be associated with altered regulatory control in this disease and could inform future strategies to investigate dysregulation in uterine fibroids.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1703356"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832642/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: Citron Rho-interacting serine/threonine kinase (CIT) is a major cytosolic protein kinase essential for midbody organisation, abscission, and cytokinesis. Dysregulation and mutations in CIT are associated with multiple cancers and neurodevelopmental disorders, including microcephaly. Although global phosphoproteomic studies have identified more than 50 phosphosites in CIT, their functional relevance and the kinases regulating them remain largely unexplored.
Methods: To systematically investigate the phosphoregulation of CIT, we curated and integrated global phosphoproteomic datasets, along with their associated experimental conditions, to comprehensively catalogue phosphorylation events reported for CIT. To assess the functional significance of CIT, we examined proteins that were differentially co-regulated with its predominant phosphosite.
Results: Serine 440 (S440), located outside the kinase domain (representing over 55% of CIT-associated phospho-signalling events across 100 experimental conditions, including Enterovirus A71 infection, metformin, and interleukin-33), was identified as its predominant phosphosite. Motif analysis revealed the presence of a D(S/T)P/P(S/T)D motif recognised by the CIT kinase domain, suggesting S440 as a predicted autophosphorylation site. Co-phosphoregulation analysis identified 136 interacting proteins and 82 predicted substrates that were positively co-regulated with CIT_S440. The resulting phospho-regulatory network comprised essential cell cycle and DNA repair regulators, including MDC1 and TRIP12. Significantly, over 120 co-regulated phosphosites were functionally linked to DNA repair and cell cycle regulation. Aberrant phosphorylation of CIT_S440 observed across cancers of the breast, colon, and bladder suggests CIT_S440 as a potential onco-phosphosite critically involved in cellular checkpoint signalling.
Discussion: These findings suggest that CIT_S440 functions as a promising therapeutic target, and the phosphosite-centric regulatory network derived in this study could serve as a platform to evaluate its phosphosite-specific therapeutic interventions.
{"title":"CIT kinase phosphorylation as significant regulatory node for cellular checkpoints.","authors":"Jaytha Thomas, Fathimathul Lubaba, Mukhtar Ahmed, Althaf Mahin, Levin John, Athira Perunelly Gopalakrishnan, Suhail Subair, Prathik Basthikoppa Shivamurthy, Rajesh Raju, Sowmya Soman","doi":"10.3389/fbinf.2025.1734030","DOIUrl":"10.3389/fbinf.2025.1734030","url":null,"abstract":"<p><strong>Introduction: </strong>Citron Rho-interacting serine/threonine kinase (CIT) is a major cytosolic protein kinase essential for midbody organisation, abscission, and cytokinesis. Dysregulation and mutations in CIT are associated with multiple cancers and neurodevelopmental disorders, including microcephaly. Although global phosphoproteomic studies have identified more than 50 phosphosites in CIT, their functional relevance and the kinases regulating them remain largely unexplored.</p><p><strong>Methods: </strong>To systematically investigate the phosphoregulation of CIT, we curated and integrated global phosphoproteomic datasets, along with their associated experimental conditions, to comprehensively catalogue phosphorylation events reported for CIT. To assess the functional significance of CIT, we examined proteins that were differentially co-regulated with its predominant phosphosite.</p><p><strong>Results: </strong>Serine 440 (S440), located outside the kinase domain (representing over 55% of CIT-associated phospho-signalling events across 100 experimental conditions, including Enterovirus A71 infection, metformin, and interleukin-33), was identified as its predominant phosphosite. Motif analysis revealed the presence of a D(S/T)P/P(S/T)D motif recognised by the CIT kinase domain, suggesting S440 as a predicted autophosphorylation site. Co-phosphoregulation analysis identified 136 interacting proteins and 82 predicted substrates that were positively co-regulated with CIT_S440. The resulting phospho-regulatory network comprised essential cell cycle and DNA repair regulators, including MDC1 and TRIP12. Significantly, over 120 co-regulated phosphosites were functionally linked to DNA repair and cell cycle regulation. Aberrant phosphorylation of CIT_S440 observed across cancers of the breast, colon, and bladder suggests CIT_S440 as a potential onco-phosphosite critically involved in cellular checkpoint signalling.</p><p><strong>Discussion: </strong>These findings suggest that CIT_S440 functions as a promising therapeutic target, and the phosphosite-centric regulatory network derived in this study could serve as a platform to evaluate its phosphosite-specific therapeutic interventions.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1734030"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12833521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-08eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1657021
Angela Wang, Elena Xiao, Jason Cheng, Xiaoxi Shen
Background: As a driving force of the Fourth Industrial Revolution, deep learning methods have achieved significant success across various fields, including genetic and genomic studies. While individual-level genetic data is ideal for deep learning models, privacy concerns and data-sharing restrictions often limit its availability to researchers.
Methods: In this paper, we investigated the potential applications of deep learning models-including deep neural networks, convolutional neural networks, recurrent neural networks, and transformers-when only genetic summary data, such as linkage disequilibrium matrices, is available. The bootstrap method was used to approximate the test error. Simulation studies and real data analyses were conducted to compare the performance of deep learning methods in genetic risk prediction using individual-level genetic data versus genetic summary data.
Results: The test mean squared errors (MSEs) of most applied deep learning models are comparable when using individual-level data versus summary data.
Conclusion: Our results suggest that suitable deep learning methods could also serve as an alternative approach to predict disease related traits when only linkage disequilibrium matrices are available as input.
{"title":"Genetic risk predictions using deep learning models with summary data.","authors":"Angela Wang, Elena Xiao, Jason Cheng, Xiaoxi Shen","doi":"10.3389/fbinf.2025.1657021","DOIUrl":"10.3389/fbinf.2025.1657021","url":null,"abstract":"<p><strong>Background: </strong>As a driving force of the Fourth Industrial Revolution, deep learning methods have achieved significant success across various fields, including genetic and genomic studies. While individual-level genetic data is ideal for deep learning models, privacy concerns and data-sharing restrictions often limit its availability to researchers.</p><p><strong>Methods: </strong>In this paper, we investigated the potential applications of deep learning models-including deep neural networks, convolutional neural networks, recurrent neural networks, and transformers-when only genetic summary data, such as linkage disequilibrium matrices, is available. The bootstrap method was used to approximate the test error. Simulation studies and real data analyses were conducted to compare the performance of deep learning methods in genetic risk prediction using individual-level genetic data versus genetic summary data.</p><p><strong>Results: </strong>The test mean squared errors (MSEs) of most applied deep learning models are comparable when using individual-level data versus summary data.</p><p><strong>Conclusion: </strong>Our results suggest that suitable deep learning methods could also serve as an alternative approach to predict disease related traits when only linkage disequilibrium matrices are available as input.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1657021"},"PeriodicalIF":3.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}