首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
TRANSAID: a hybrid deep learning framework for translation site prediction with integrated biological feature scoring. TRANSAID:一个用于翻译站点预测的混合深度学习框架,集成了生物特征评分。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-19 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1676149
Yan Li, Boran Wang, Zhen Liu, Wei Wei, Caiyi Fei, Shi Xu, Tiyun Han, Wei Geng, Zengding Wu

Introduction: Translation initiation and termination are critical regulatory checkpoints in protein synthesis, yet accurate computational prediction of their sites remains challenging due to training data biases and the complexity of full-length transcripts.

Methods: To address these limitations, we present TRANSAID (TRANSlation AI for Detection), a novel deep learning framework that accurately and simultaneously predicts translation initiation (TIS) and termination (TTS) sites from complete transcript sequences. TRANSAID's hierarchical architecture efficiently processes long transcripts, capturing both local motifs and long-range dependencies. Crucially, the model was trained on a human transcriptome dataset that was rigorously partitioned at the gene level to prevent data leakage and included both protein-coding (NM) and non-coding (NR) transcripts.

Results: This mixed-training strategy enables TRANSAID to achieve high fidelity, correctly identifying 73.61% of NR transcripts as non-coding. Performance is further enhanced by an integrated biological scoring system, improving "perfect ORF prediction" for coding sequences to 94.94% and "correct non-coding prediction" to 82.00%. The human-trained model demonstrates remarkable cross-species applicability, maintaining high accuracy on organisms from mammals to yeast. Beyond annotation, TRANSAID serves as a powerful discovery tool for novel coding events. When applied to long-read sequencing data, it accurately identified previously unannotated protein isoforms validated by mass spectrometry (76.28% validation rate). Furthermore, homology searches of high-scoring ORFs predicted within NR transcripts suggest a strong potential for identifying cryptic translation events.

Discussion: As a fully documented open-source tool with a user-friendly web server, TRANSAID provides a powerful and accessible resource for improving transcriptome annotation and proteomic discovery.

翻译起始和终止是蛋白质合成中关键的调控检查点,但由于训练数据的偏差和全长转录本的复杂性,对其位置的准确计算预测仍然具有挑战性。方法:为了解决这些限制,我们提出了TRANSAID (TRANSlation AI for Detection),这是一个新的深度学习框架,可以准确地同时预测完整转录序列中的翻译起始(TIS)和终止(TTS)位点。TRANSAID的分层结构有效地处理长转录本,捕捉本地主题和长期依赖关系。至关重要的是,该模型是在人类转录组数据集上进行训练的,该数据集在基因水平上进行了严格划分,以防止数据泄露,并包括蛋白质编码(NM)和非编码(NR)转录本。结果:这种混合训练策略使TRANSAID达到高保真度,正确识别73.61%的NR转录本为非编码。集成的生物评分系统进一步提高了性能,将编码序列的“完美ORF预测”提高到94.94%,“正确非编码预测”提高到82.00%。人类训练的模型显示出显著的跨物种适用性,在从哺乳动物到酵母的生物体上保持了很高的准确性。除了注释之外,TRANSAID还是一个强大的发现新编码事件的工具。当应用于长读测序数据时,它准确地鉴定了先前未经质谱验证的未注释的蛋白质异构体(验证率为76.28%)。此外,对NR转录本中预测的高分orf的同源性搜索表明,识别隐翻译事件具有很大的潜力。讨论:TRANSAID是一个有完整文档的开源工具,带有用户友好的web服务器,它为改进转录组注释和蛋白质组学发现提供了强大的可访问资源。
{"title":"TRANSAID: a hybrid deep learning framework for translation site prediction with integrated biological feature scoring.","authors":"Yan Li, Boran Wang, Zhen Liu, Wei Wei, Caiyi Fei, Shi Xu, Tiyun Han, Wei Geng, Zengding Wu","doi":"10.3389/fbinf.2025.1676149","DOIUrl":"10.3389/fbinf.2025.1676149","url":null,"abstract":"<p><strong>Introduction: </strong>Translation initiation and termination are critical regulatory checkpoints in protein synthesis, yet accurate computational prediction of their sites remains challenging due to training data biases and the complexity of full-length transcripts.</p><p><strong>Methods: </strong>To address these limitations, we present TRANSAID (TRANSlation AI for Detection), a novel deep learning framework that accurately and simultaneously predicts translation initiation (TIS) and termination (TTS) sites from complete transcript sequences. TRANSAID's hierarchical architecture efficiently processes long transcripts, capturing both local motifs and long-range dependencies. Crucially, the model was trained on a human transcriptome dataset that was rigorously partitioned at the gene level to prevent data leakage and included both protein-coding (NM) and non-coding (NR) transcripts.</p><p><strong>Results: </strong>This mixed-training strategy enables TRANSAID to achieve high fidelity, correctly identifying 73.61% of NR transcripts as non-coding. Performance is further enhanced by an integrated biological scoring system, improving \"perfect ORF prediction\" for coding sequences to 94.94% and \"correct non-coding prediction\" to 82.00%. The human-trained model demonstrates remarkable cross-species applicability, maintaining high accuracy on organisms from mammals to yeast. Beyond annotation, TRANSAID serves as a powerful discovery tool for novel coding events. When applied to long-read sequencing data, it accurately identified previously unannotated protein isoforms validated by mass spectrometry (76.28% validation rate). Furthermore, homology searches of high-scoring ORFs predicted within NR transcripts suggest a strong potential for identifying cryptic translation events.</p><p><strong>Discussion: </strong>As a fully documented open-source tool with a user-friendly web server, TRANSAID provides a powerful and accessible resource for improving transcriptome annotation and proteomic discovery.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1676149"},"PeriodicalIF":3.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12862215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146114624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PreBP: an interpretable, optimized ensemble framework using routine complete blood count for rapid pathogen identification in bacterial pneumonia. PreBP:一个可解释的、优化的集合框架,使用常规全血细胞计数快速鉴定细菌性肺炎的病原体。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-14 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1769816
Xiaoxi Hao, Dingjian Liang, Yimin Shen, Cuimin Sun, Wei Lan

Introduction: Bacterial pneumonia remains a major global health challenge, and early pathogen identification is important for timely and targeted treatment. However, conventional microbiological diagnostics such as sputum or blood culture are labor-intensive and time-consuming.

Methods: We propose an interpretable ensemble learning framework (PreBP) for rapid pathogen identification using routinely available complete blood count (CBC) parameters. We analyzed 1,334 CBC samples from patients with culture-confirmed bacterial pneumonia caused by four major pathogens: Pseudomonas aeruginosa, Escherichia coli, Staphylococcus aureus, and Streptococcus pneumoniae. Pathogen labels were determined based on clinical culture results. Five machine learning models (extreme gradient boosting (XGBoost), multilayer perceptron neural network (MLPNN), adaptive boosting (AdaBoost), random forest (RF), and extremely randomized trees (ExtraTrees)) were trained as comparators, and PreBP was developed with metaheuristic-optimized hyperparameters. Key CBC biomarkers were refined using a dual-phase feature selection strategy combining Lasso and Boruta. To enhance transparency, SHapley additive explanations (SHAP) were applied to provide both global biomarker importance and local, case-level explanations.

Results: PreBP achieved the best overall performance, with an AUC of 0.920, precision of 87.1%, and accuracy and sensitivity of 86.7%.

Discussion: Because the framework relies on routine CBC measurements, it can generate interpretable predictions once CBC results are available, which may provide supplementary evidence for earlier pathogen-oriented clinical decision-making alongside culture-dependent workflows. Overall, PreBP offers an interpretable and computational approach for pathogen identification in bacterial pneumonia based on routine laboratory data.

细菌性肺炎仍然是一个主要的全球卫生挑战,早期病原体识别对于及时和有针对性的治疗非常重要。然而,传统的微生物诊断,如痰或血培养,是劳动密集型和耗时的。方法:我们提出了一个可解释的集成学习框架(PreBP),用于使用常规全血细胞计数(CBC)参数快速鉴定病原体。我们分析了1334例由四种主要病原体(铜绿假单胞菌、大肠杆菌、金黄色葡萄球菌和肺炎链球菌)引起的培养确诊细菌性肺炎患者的CBC样本。根据临床培养结果确定病原体标记。五个机器学习模型(极端梯度增强(XGBoost)、多层感知器神经网络(MLPNN)、自适应增强(AdaBoost)、随机森林(RF)和极端随机树(ExtraTrees))作为比较器进行训练,并使用元启发优化的超参数开发PreBP。采用Lasso和Boruta相结合的双相特征选择策略对关键的CBC生物标志物进行了细化。为了提高透明度,应用SHapley加性解释(SHAP)来提供全球生物标志物重要性和局部病例级解释。结果:PreBP综合性能最佳,AUC为0.920,精密度为87.1%,准确度和灵敏度为86.7%。讨论:由于该框架依赖于常规CBC测量,一旦CBC结果可用,它可以产生可解释的预测,这可能为早期以病原体为导向的临床决策提供补充证据,以及依赖文化的工作流程。总体而言,PreBP提供了一种基于常规实验室数据的细菌性肺炎病原体鉴定的可解释和计算方法。
{"title":"PreBP: an interpretable, optimized ensemble framework using routine complete blood count for rapid pathogen identification in bacterial pneumonia.","authors":"Xiaoxi Hao, Dingjian Liang, Yimin Shen, Cuimin Sun, Wei Lan","doi":"10.3389/fbinf.2025.1769816","DOIUrl":"10.3389/fbinf.2025.1769816","url":null,"abstract":"<p><strong>Introduction: </strong>Bacterial pneumonia remains a major global health challenge, and early pathogen identification is important for timely and targeted treatment. However, conventional microbiological diagnostics such as sputum or blood culture are labor-intensive and time-consuming.</p><p><strong>Methods: </strong>We propose an interpretable ensemble learning framework (PreBP) for rapid pathogen identification using routinely available complete blood count (CBC) parameters. We analyzed 1,334 CBC samples from patients with culture-confirmed bacterial pneumonia caused by four major pathogens: <i>Pseudomonas aeruginosa</i>, <i>Escherichia coli</i>, <i>Staphylococcus aureus</i>, and <i>Streptococcus</i> pneumoniae. Pathogen labels were determined based on clinical culture results. Five machine learning models (extreme gradient boosting (XGBoost), multilayer perceptron neural network (MLPNN), adaptive boosting (AdaBoost), random forest (RF), and extremely randomized trees (ExtraTrees)) were trained as comparators, and PreBP was developed with metaheuristic-optimized hyperparameters. Key CBC biomarkers were refined using a dual-phase feature selection strategy combining Lasso and Boruta. To enhance transparency, SHapley additive explanations (SHAP) were applied to provide both global biomarker importance and local, case-level explanations.</p><p><strong>Results: </strong>PreBP achieved the best overall performance, with an AUC of 0.920, precision of 87.1%, and accuracy and sensitivity of 86.7%.</p><p><strong>Discussion: </strong>Because the framework relies on routine CBC measurements, it can generate interpretable predictions once CBC results are available, which may provide supplementary evidence for earlier pathogen-oriented clinical decision-making alongside culture-dependent workflows. Overall, PreBP offers an interpretable and computational approach for pathogen identification in bacterial pneumonia based on routine laboratory data.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1769816"},"PeriodicalIF":3.9,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12847367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146087272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An integrated subtractive genomics and immunoinformatic approach for designing a multi-epitope peptide vaccine against methicillin-resistant Staphylococcus aureus. 综合减法基因组学和免疫信息学方法设计抗耐甲氧西林金黄色葡萄球菌的多表位肽疫苗。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-14 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1745495
Nandha Kumar Subramani, Subhashree Venugopal, Anand Prem Rajan

Introduction: MRSA is a multi-drug-resistant bacteria responsible for severe infections that has become a major health concern. Due to constraints of traditional methods, there is a need for developing a new approach to prevent the MRSA-related infections by targeting key pathogens.

Methods: Initially, the subtractive genomics was applied to the MRSA proteome to identify non-homologous, essential, and virulence targets using comparative BLAST-based screening. Further, immunoinformatic tools were employed for B- and T-cell epitope prediction and vaccine construction with appropriate adjuvants and linkers, followed by immune simulation and molecular docking with immune receptors.

Results: Comparative metabolic pathway analysis identified 294 MRSA pathway proteins, with acetolactate synthase (ALS) as a non-homologous, essential, and virulent protein that is involved in the branched amino acid biosynthesis pathway. The constructed ALS vaccine consists of 3 B-cell and 19 T-cell epitopes exhibited stable immunological features with 97.55% global population coverage. Molecular docking revealed that ALS exhibited a superior binding affinity with the TLR4 receptor (-1,438.7 kcal/mol) than the TLR2 receptor (-1,103.5 kcal/mol), which was further confirmed by high structural stability and compactness analysis. Immune simulations also exhibited elevated IgM, IgG subtypes, and cytokine productions, suggesting a robust humoral and cellular immunity.

Discussion: Identified ALS highlights its biological relevance in MRSA survival. The stability predictions with TLR4 suggested effective activation of innate immunity that may enhance antigen presentation and downstream adaptive immunity. The validation of the ALS vaccine's safety and immunogenicity further requires comprehensive in vitro and in vivo examinations.

Conclusion: Thus, ALS is recognized as a promising MRSA vaccine candidate and has the potential to activate immune responses effectively.

MRSA是一种多重耐药细菌,可导致严重感染,已成为主要的健康问题。由于传统方法的局限性,需要开发一种针对关键病原体的新方法来预防mrsa相关感染。方法:首先,将减法基因组学应用于MRSA蛋白质组,通过基于blast的比较筛选来鉴定非同源、必需和毒力靶点。此外,利用免疫信息学工具预测B细胞和t细胞表位,并使用合适的佐剂和连接剂构建疫苗,随后进行免疫模拟和与免疫受体的分子对接。结果:比较代谢途径分析鉴定出294个MRSA途径蛋白,其中乙酰乳酸合成酶(acetolactate synthase, ALS)是参与支链氨基酸生物合成途径的非同源、必需和毒性蛋白。构建的ALS疫苗由3个b细胞和19个t细胞表位组成,具有稳定的免疫特性,全球人口覆盖率为97.55%。分子对接发现,ALS与TLR4受体(- 1438.7 kcal/mol)的结合亲和力优于TLR2受体(- 1103.5 kcal/mol),高结构稳定性和紧密性分析进一步证实了这一点。免疫模拟也显示IgM、IgG亚型和细胞因子的产生升高,表明有强大的体液和细胞免疫。讨论:已鉴定的ALS突出了其与MRSA生存的生物学相关性。TLR4的稳定性预测表明,它可以有效激活先天免疫,从而增强抗原呈递和下游适应性免疫。进一步验证ALS疫苗的安全性和免疫原性需要全面的体外和体内试验。结论:ALS被认为是一种很有前途的MRSA候选疫苗,具有有效激活免疫反应的潜力。
{"title":"An integrated subtractive genomics and immunoinformatic approach for designing a multi-epitope peptide vaccine against methicillin-resistant <i>Staphylococcus aureus</i>.","authors":"Nandha Kumar Subramani, Subhashree Venugopal, Anand Prem Rajan","doi":"10.3389/fbinf.2025.1745495","DOIUrl":"10.3389/fbinf.2025.1745495","url":null,"abstract":"<p><strong>Introduction: </strong>MRSA is a multi-drug-resistant bacteria responsible for severe infections that has become a major health concern. Due to constraints of traditional methods, there is a need for developing a new approach to prevent the MRSA-related infections by targeting key pathogens.</p><p><strong>Methods: </strong>Initially, the subtractive genomics was applied to the MRSA proteome to identify non-homologous, essential, and virulence targets using comparative BLAST-based screening. Further, immunoinformatic tools were employed for B- and T-cell epitope prediction and vaccine construction with appropriate adjuvants and linkers, followed by immune simulation and molecular docking with immune receptors.</p><p><strong>Results: </strong>Comparative metabolic pathway analysis identified 294 MRSA pathway proteins, with acetolactate synthase (ALS) as a non-homologous, essential, and virulent protein that is involved in the branched amino acid biosynthesis pathway. The constructed ALS vaccine consists of 3 B-cell and 19 T-cell epitopes exhibited stable immunological features with 97.55% global population coverage. Molecular docking revealed that ALS exhibited a superior binding affinity with the TLR4 receptor (-1,438.7 kcal/mol) than the TLR2 receptor (-1,103.5 kcal/mol), which was further confirmed by high structural stability and compactness analysis. Immune simulations also exhibited elevated IgM, IgG subtypes, and cytokine productions, suggesting a robust humoral and cellular immunity.</p><p><strong>Discussion: </strong>Identified ALS highlights its biological relevance in MRSA survival. The stability predictions with TLR4 suggested effective activation of innate immunity that may enhance antigen presentation and downstream adaptive immunity. The validation of the ALS vaccine's safety and immunogenicity further requires comprehensive <i>in vitro</i> and <i>in vivo</i> examinations.</p><p><strong>Conclusion: </strong>Thus, ALS is recognized as a promising MRSA vaccine candidate and has the potential to activate immune responses effectively.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1745495"},"PeriodicalIF":3.9,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12847441/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146088246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SpaLLM: a general framework for spatial domain identification with large language models. SpaLLM:使用大型语言模型进行空间域识别的通用框架。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1713975
Zeyu Zou, Ziheng Duan

Spatial transcriptomics (ST) technologies enable the profiling of gene expression while preserving spatial context, offering unprecedented insights into tissue organization. However, traditional spatial domain identification methods primarily rely on gene expression matrices and spatial coordinates while overlooking the rich biological knowledge encoded in gene functional descriptions. Here, we propose SpaLLM, a general framework that integrates large language model (LLM) embeddings of gene descriptions with conventional spatial transcriptomics analysis. Our approach leverages pre-computed GenePT embeddings from NCBI gene summaries to create biologically-informed gene representations. SpaLLM combines these LLM-derived gene features with cell-gene expression matrices through matrix multiplication, generating enriched cell representations that capture both expression patterns and functional knowledge. These enriched features are then integrated with existing graph-based spatial analysis methods for improved spatial domain identification. Extensive validation on 12 sequencing-based Visium sections and an independent imaging-based osmFISH dataset demonstrate that SpaLLM consistently enhances spatial domain identification. Our modular framework can be seamlessly integrated with existing spatial analysis pipelines, making it broadly applicable to diverse research scenarios.

空间转录组学(ST)技术能够在保留空间背景的同时分析基因表达,为组织组织提供前所未有的见解。然而,传统的空间域识别方法主要依赖于基因表达矩阵和空间坐标,忽略了基因功能描述中所编码的丰富的生物学知识。在这里,我们提出了SpaLLM,这是一个将大型语言模型(LLM)嵌入基因描述与传统空间转录组学分析相结合的通用框架。我们的方法利用NCBI基因摘要中预先计算的GenePT嵌入来创建生物学知情的基因表示。SpaLLM通过矩阵增殖将这些llm衍生的基因特征与细胞基因表达矩阵相结合,生成丰富的细胞表示,捕获表达模式和功能知识。然后将这些丰富的特征与现有的基于图的空间分析方法相结合,以改进空间域识别。对12个基于测序的Visium切片和一个独立的基于成像的osmFISH数据集的广泛验证表明,SpaLLM持续增强了空间域识别。我们的模块化框架可以与现有的空间分析管道无缝集成,使其广泛适用于不同的研究场景。
{"title":"SpaLLM: a general framework for spatial domain identification with large language models.","authors":"Zeyu Zou, Ziheng Duan","doi":"10.3389/fbinf.2025.1713975","DOIUrl":"10.3389/fbinf.2025.1713975","url":null,"abstract":"<p><p>Spatial transcriptomics (ST) technologies enable the profiling of gene expression while preserving spatial context, offering unprecedented insights into tissue organization. However, traditional spatial domain identification methods primarily rely on gene expression matrices and spatial coordinates while overlooking the rich biological knowledge encoded in gene functional descriptions. Here, we propose SpaLLM, a general framework that integrates large language model (LLM) embeddings of gene descriptions with conventional spatial transcriptomics analysis. Our approach leverages pre-computed GenePT embeddings from NCBI gene summaries to create biologically-informed gene representations. SpaLLM combines these LLM-derived gene features with cell-gene expression matrices through matrix multiplication, generating enriched cell representations that capture both expression patterns and functional knowledge. These enriched features are then integrated with existing graph-based spatial analysis methods for improved spatial domain identification. Extensive validation on 12 sequencing-based Visium sections and an independent imaging-based osmFISH dataset demonstrate that SpaLLM consistently enhances spatial domain identification. Our modular framework can be seamlessly integrated with existing spatial analysis pipelines, making it broadly applicable to diverse research scenarios.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1713975"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12833451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent trends in machine learning and deep learning-based prediction of G-protein coupled receptor-ligand binding affinities. 机器学习和基于深度学习的g蛋白偶联受体-配体结合亲和力预测的最新趋势。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1712577
Joshua Stephenson, Konda Reddy Karnati

Accurately predicting protein-ligand binding affinity is key in drug discovery. Machine Learning and Deep Learning methods used in the drug discovery process have advanced the prediction of drug-target binding affinities, particularly for G protein-coupled receptors (GPCRs), a pharmacologically significant yet structurally heterogeneous protein family. In this review, binding affinity prediction models are examined and organized according to sequence-based one-dimensional, graph-based two-dimensional, and structure-based three-dimensional frameworks. Sequence-based models utilize convolutional neural networks for high-throughput screening. Recently published models incorporated attention mechanisms and self-supervised learning, enhancing interpretability and reducing dependence on annotated datasets. Graph-based models employ graph neural networks and molecular contact maps to capture topological features, enabling substructure-sensitive predictions. Structure-based approaches integrate spatial and conformational data into high-resolution interaction models. The hybrid use of these three approaches could significantly increase the success rate of in silico models for drug discovery, particularly for GPCRs.

准确预测蛋白质与配体的结合亲和力是药物发现的关键。在药物发现过程中使用的机器学习和深度学习方法已经推进了药物靶标结合亲和力的预测,特别是对于G蛋白偶联受体(gpcr),这是一种具有药理意义但结构异质的蛋白质家族。在这篇综述中,结合亲和预测模型根据基于序列的一维,基于图的二维和基于结构的三维框架进行了检查和组织。基于序列的模型利用卷积神经网络进行高通量筛选。最近发表的模型结合了注意机制和自监督学习,增强了可解释性并减少了对注释数据集的依赖。基于图的模型采用图神经网络和分子接触图来捕获拓扑特征,从而实现子结构敏感的预测。基于结构的方法将空间和构象数据集成到高分辨率的相互作用模型中。这三种方法的混合使用可以显著提高药物发现的计算机模型的成功率,特别是对于gpcr。
{"title":"Recent trends in machine learning and deep learning-based prediction of G-protein coupled receptor-ligand binding affinities.","authors":"Joshua Stephenson, Konda Reddy Karnati","doi":"10.3389/fbinf.2025.1712577","DOIUrl":"10.3389/fbinf.2025.1712577","url":null,"abstract":"<p><p>Accurately predicting protein-ligand binding affinity is key in drug discovery. Machine Learning and Deep Learning methods used in the drug discovery process have advanced the prediction of drug-target binding affinities, particularly for G protein-coupled receptors (GPCRs), a pharmacologically significant yet structurally heterogeneous protein family. In this review, binding affinity prediction models are examined and organized according to sequence-based one-dimensional, graph-based two-dimensional, and structure-based three-dimensional frameworks. Sequence-based models utilize convolutional neural networks for high-throughput screening. Recently published models incorporated attention mechanisms and self-supervised learning, enhancing interpretability and reducing dependence on annotated datasets. Graph-based models employ graph neural networks and molecular contact maps to capture topological features, enabling substructure-sensitive predictions. Structure-based approaches integrate spatial and conformational data into high-resolution interaction models. The hybrid use of these three approaches could significantly increase the success rate of <i>in silico</i> models for drug discovery, particularly for GPCRs.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1712577"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrative transcriptomic analysis reveals microglial metabolic-inflammatory crosstalk of HK2-HSPA5-TNF axis after intracerebral hemorrhage. 综合转录组学分析揭示脑出血后HK2-HSPA5-TNF轴的小胶质代谢-炎症串音。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1740715
Yi Zhang, Yongqian Liu, Wei Meng, Xiaobo Yu, Xiaojun Xu

Background: Intracerebral hemorrhage (ICH) triggers secondary brain injury through neuroinflammation, yet the interplay between metabolic reprogramming and inflammatory responses remains poorly defined. This study investigated how glucose metabolism dysregulation contributes to neuroinflammatory pathogenesis following ICH.

Methods: We integrated transcriptomic datasets from bulk RNA sequencing (human perihematomal tissue), single-cell RNA sequencing (mouse ICH model), and spatial transcriptomics (mouse time-series). Bioinformatic analyses included differential expression screening, single-cell weighted gene co-expression network analysis, pseudotemporal trajectory reconstruction, and cell-cell communication inference to identify key metabolic-inflammation regulators and their spatiotemporal dynamics.

Results: Multi-omics convergence revealed hexokinase 2 (HK2), heat shock protein A5 (HSPA5), and tumor necrosis factor (TNF) as core regulators linking glucose metabolism to neuroinflammation. Single-cell analysis showed significant time-dependent regulation of HK2 in microglia, while spatial transcriptomics uncovered synchronized alterations of HK2, HSPA5, and TNF in perihematomal regions at day 7. Cell communication analysis highlighted enhanced microglia-to-neutrophil signaling via Tnf-Tnfrsf1b pairs, with TNF signaling identified as the most significantly upregulated pathway in ICH conditions.

Conclusion: Our multi-omics approach reveals coordinated dysregulation of glucose metabolism and inflammatory genes following ICH, with time-dependent HK2 regulation in microglia and synchronized transcriptional changes at day 7 representing critical events in neuroinflammatory progression. The identified gene networks and cellular communication patterns provide new insights into the metabolic-immune interface in ICH, offering potential targets for future therapeutic strategies.

背景:脑出血(ICH)通过神经炎症引发继发性脑损伤,但代谢重编程与炎症反应之间的相互作用仍不清楚。本研究探讨了脑出血后葡萄糖代谢失调如何促进神经炎症发病。方法:我们整合了大量RNA测序(人血肿周围组织)、单细胞RNA测序(小鼠脑出血模型)和空间转录组学(小鼠时间序列)的转录组学数据。生物信息学分析包括差异表达筛选、单细胞加权基因共表达网络分析、伪时间轨迹重建和细胞-细胞通讯推断,以确定关键的代谢-炎症调节因子及其时空动态。结果:多组学趋同显示己糖激酶2 (HK2)、热休克蛋白A5 (HSPA5)和肿瘤坏死因子(TNF)是糖代谢与神经炎症相关的核心调节因子。单细胞分析显示HK2在小胶质细胞中有明显的时间依赖性调节,而空间转录组学发现在第7天,HK2、HSPA5和TNF在血肿周围区域同步改变。细胞通讯分析强调通过TNF - tnfrsf1b对增强小胶质细胞到中性粒细胞的信号传导,TNF信号传导被认为是ICH条件下最显著的上调途径。结论:我们的多组学方法揭示了脑出血后糖代谢和炎症基因的协调失调,小胶质细胞中HK2的时间依赖性调节和第7天的同步转录变化代表了神经炎症进展的关键事件。已确定的基因网络和细胞通讯模式为脑出血的代谢-免疫界面提供了新的见解,为未来的治疗策略提供了潜在的靶点。
{"title":"Integrative transcriptomic analysis reveals microglial metabolic-inflammatory crosstalk of HK2-HSPA5-TNF axis after intracerebral hemorrhage.","authors":"Yi Zhang, Yongqian Liu, Wei Meng, Xiaobo Yu, Xiaojun Xu","doi":"10.3389/fbinf.2025.1740715","DOIUrl":"10.3389/fbinf.2025.1740715","url":null,"abstract":"<p><strong>Background: </strong>Intracerebral hemorrhage (ICH) triggers secondary brain injury through neuroinflammation, yet the interplay between metabolic reprogramming and inflammatory responses remains poorly defined. This study investigated how glucose metabolism dysregulation contributes to neuroinflammatory pathogenesis following ICH.</p><p><strong>Methods: </strong>We integrated transcriptomic datasets from bulk RNA sequencing (human perihematomal tissue), single-cell RNA sequencing (mouse ICH model), and spatial transcriptomics (mouse time-series). Bioinformatic analyses included differential expression screening, single-cell weighted gene co-expression network analysis, pseudotemporal trajectory reconstruction, and cell-cell communication inference to identify key metabolic-inflammation regulators and their spatiotemporal dynamics.</p><p><strong>Results: </strong>Multi-omics convergence revealed hexokinase 2 (HK2), heat shock protein A5 (HSPA5), and tumor necrosis factor (TNF) as core regulators linking glucose metabolism to neuroinflammation. Single-cell analysis showed significant time-dependent regulation of HK2 in microglia, while spatial transcriptomics uncovered synchronized alterations of HK2, HSPA5, and TNF in perihematomal regions at day 7. Cell communication analysis highlighted enhanced microglia-to-neutrophil signaling via Tnf-Tnfrsf1b pairs, with TNF signaling identified as the most significantly upregulated pathway in ICH conditions.</p><p><strong>Conclusion: </strong>Our multi-omics approach reveals coordinated dysregulation of glucose metabolism and inflammatory genes following ICH, with time-dependent HK2 regulation in microglia and synchronized transcriptional changes at day 7 representing critical events in neuroinflammatory progression. The identified gene networks and cellular communication patterns provide new insights into the metabolic-immune interface in ICH, offering potential targets for future therapeutic strategies.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1740715"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12833071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-dimensional co-expression network analysis reveals persistent TRH gene expression throughout axolotl telencephalon regeneration. 高维共表达网络分析揭示了TRH基因在美西河豚端脑再生过程中的持续表达。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1697212
Iveth Gómez-Morales, Adriana P Mendizabal-Ruiz, J Alejandro Morales, Teresa Romero-Gutiérrez

Introduction: The Axolotl (Ambystoma mexicanum) offers a deep insight into brain regeneration by fully reconstructing its telencephalon post-injury, a capability that most vertebrates do not have. This study aimed to identify hub genes (highest-weighted genes) underlying this process and to map their cell location by analyzing spatiotemporal transcriptomic data using high-dimensional weighted gene co-expression network analysis, integrating protein-protein interaction networks, and cross-validating findings through literature.

Results: We identified 180 hub genes across the regeneration timeline, including several with conserved orthologs previously reported in vertebrate regeneration models. Among these candidates, TRH (Thyrotropin-Releasing Hormone) displayed the most consistent spatiotemporal pattern, appearing repeatedly as a hub gene and localizing to MSN enriched regions at multiple stages. TRH is broadly characterized in vertebrates as a neuroendocrine peptide with roles in hormonal signaling, and MSNs are known to respond to a variety of hormonal and neuropeptidergic cues. In our dataset, this background provides additional perspective on the transcriptional configurations in which TRH appears. Other hub genes showed stage/cell specific patterns, together outlining a heterogeneous and dynamic landscape of transcriptional states detected during telencephalon regeneration.

Conclusion: This study provides a descriptive map of gene co-expression dynamics during axolotl telencephalon regeneration. By integrating hdWGCNA, spatial transcriptomics, and network-based context, we identify hub genes and transcriptional states associated with injury response, including a persistent TRH linked MSN state. These findings offer a foundation for future experimental studies aimed at elucidating the molecular basis of axolotl brain repair.

简介:蝾螈(Ambystoma mexicanum)通过完全重建其损伤后的端脑,提供了对大脑再生的深刻见解,这是大多数脊椎动物所没有的能力。本研究旨在通过高维加权基因共表达网络分析,整合蛋白质-蛋白质相互作用网络,并通过文献交叉验证发现,分析时空转录组数据,确定这一过程背后的枢纽基因(权重最高的基因),并绘制其细胞位置。结果:我们在再生时间线上鉴定了180个枢纽基因,其中包括几个先前在脊椎动物再生模型中报道的保守同源基因。在这些候选基因中,TRH(促甲状腺激素释放激素)表现出最一致的时空模式,作为枢纽基因反复出现,并在多个阶段定位于MSN富集区域。在脊椎动物中,TRH被广泛认为是一种参与激素信号传导的神经内分泌肽,而msn则对多种激素和神经肽能信号做出反应。在我们的数据集中,这一背景为TRH出现的转录配置提供了额外的视角。其他中枢基因显示阶段/细胞特异性模式,共同概述了端脑再生过程中检测到的转录状态的异质性和动态景观。结论:本研究提供了蝾螈端脑再生过程中基因共表达动态的描述图谱。通过整合hdWGCNA、空间转录组学和基于网络的背景,我们确定了与损伤反应相关的枢纽基因和转录状态,包括与TRH相关的持续MSN状态。这些发现为今后的实验研究奠定了基础,旨在阐明蝾螈脑修复的分子基础。
{"title":"High-dimensional co-expression network analysis reveals persistent TRH gene expression throughout axolotl telencephalon regeneration.","authors":"Iveth Gómez-Morales, Adriana P Mendizabal-Ruiz, J Alejandro Morales, Teresa Romero-Gutiérrez","doi":"10.3389/fbinf.2025.1697212","DOIUrl":"10.3389/fbinf.2025.1697212","url":null,"abstract":"<p><strong>Introduction: </strong>The Axolotl (<i>Ambystoma mexicanum</i>) offers a deep insight into brain regeneration by fully reconstructing its telencephalon post-injury, a capability that most vertebrates do not have. This study aimed to identify hub genes (highest-weighted genes) underlying this process and to map their cell location by analyzing spatiotemporal transcriptomic data using high-dimensional weighted gene co-expression network analysis, integrating protein-protein interaction networks, and cross-validating findings through literature.</p><p><strong>Results: </strong>We identified 180 hub genes across the regeneration timeline, including several with conserved orthologs previously reported in vertebrate regeneration models. Among these candidates, TRH (Thyrotropin-Releasing Hormone) displayed the most consistent spatiotemporal pattern, appearing repeatedly as a hub gene and localizing to MSN enriched regions at multiple stages. TRH is broadly characterized in vertebrates as a neuroendocrine peptide with roles in hormonal signaling, and MSNs are known to respond to a variety of hormonal and neuropeptidergic cues. In our dataset, this background provides additional perspective on the transcriptional configurations in which TRH appears. Other hub genes showed stage/cell specific patterns, together outlining a heterogeneous and dynamic landscape of transcriptional states detected during telencephalon regeneration.</p><p><strong>Conclusion: </strong>This study provides a descriptive map of gene co-expression dynamics during axolotl telencephalon regeneration. By integrating hdWGCNA, spatial transcriptomics, and network-based context, we identify hub genes and transcriptional states associated with injury response, including a persistent TRH linked MSN state. These findings offer a foundation for future experimental studies aimed at elucidating the molecular basis of axolotl brain repair.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1697212"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832632/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REST missense mutations reveal disrupted Re1 motif binding and co-repressor interactions in uterine fibroids. REST错义突变揭示子宫肌瘤中中断的Re1基序结合和共抑制因子相互作用。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1703356
Srineevas Sriram, Chandresh Palanichamy, P T Subash, Manshi Kumari Gupta, C Sudandiradoss

Introduction: The Re1-Silencing Transcription Factor (REST) is a master regulator of gene silencing, orchestrating transcriptional repression by tethering chromatin-modifying co-repressors to the Re1 motif of target genes. While REST is recognized as a sentinel of cellular identity, its role in uterine fibroids (UF) remains unclear. This study aims to investigate how structural perturbations in REST may compromise its regulatory function and contribute to altered transcriptional control in fibroid biology.

Methods: A deep structural interrogation of REST was performed through expansive in silico analysis of 938 missense SNPs. Evolutionary conservation was assessed across ten primate species to identify structurally disruptive variants. Structural modelling, protein-protein and protein-DNA docking analyses were conducted to evaluate interactions with co-repressors and DNA. Molecular dynamics simulations were used to assess conformational stability, flexibility, compactness, and energetic changes in wild-type and mutant REST variants.

Results: Five structurally disruptive REST variants (Y31C, Y31D, L76Q, Y283C, L427Q) were identified at evolutionarily conserved residues. Structural modelling and docking analyses revealed weakened affinity for co-repressors, with the Y283C variant showing a marked reduction in SIN3A interaction (Z-score: 2.4 to -1.2) and impaired DNA binding (Z-score: 2.0 to -1.3). Molecular dynamics simulations demonstrated that Y283C increased rigidity (RMSF: 0.33 to 0.27 nm), reduced compactness (Rg: 3.48-3.51 nm), and lowered potential energy. Upon Re1 binding, destabilization intensified, with increased RMSD (0.95-1.07 nm) and pronounced shifts in energy.

Discussion: This integrative analysis highlights REST as a candidate regulatory component in uterine fibroid biology. Structural disruption of REST, particularly through the Y283C mutation, may destabilize molecular interactions and compromise DNA-binding precision, potentially unleashing transcriptional noise that fuels fibroid growth. These findings suggest that perturbation of REST-mediated transcriptional repression may be associated with altered regulatory control in this disease and could inform future strategies to investigate dysregulation in uterine fibroids.

Re1沉默转录因子(REST)是基因沉默的主要调控因子,通过将染色质修饰共抑制因子系在靶基因的Re1基序上来协调转录抑制。虽然REST被认为是细胞身份的哨兵,但其在子宫肌瘤(UF)中的作用尚不清楚。本研究旨在探讨REST的结构扰动如何损害其调节功能,并导致肌瘤生物学中转录控制的改变。方法:通过对938个错义snp进行广泛的芯片分析,对REST进行深入的结构分析。对10种灵长类动物的进化保护进行了评估,以确定结构上的破坏性变异。通过结构建模、蛋白质-蛋白质和蛋白质-DNA对接分析来评估共抑制因子和DNA的相互作用。分子动力学模拟用于评估野生型和突变型REST变体的构象稳定性、灵活性、紧凑性和能量变化。结果:在进化保守的残基上鉴定出5个具有结构破坏性的REST变体(Y31C、Y31D、L76Q、Y283C、L427Q)。结构建模和对接分析显示,Y283C变体对共抑制因子的亲和力减弱,显示出SIN3A相互作用显著降低(Z-score: 2.4至-1.2),DNA结合受损(Z-score: 2.0至-1.3)。分子动力学模拟表明,Y283C提高了刚性(RMSF: 0.33 ~ 0.27 nm),降低了致密度(Rg: 3.48 ~ 3.51 nm),降低了势能。Re1结合后,不稳定性加剧,RMSD (0.95-1.07 nm)增加,能量明显变化。讨论:这一综合分析强调REST作为子宫肌瘤生物学中的候选调节成分。REST的结构破坏,特别是通过Y283C突变,可能会破坏分子相互作用的稳定性,损害dna结合的精度,潜在地释放促进肌瘤生长的转录噪声。这些发现表明,rest介导的转录抑制的扰动可能与这种疾病的调节控制改变有关,并可能为未来研究子宫肌瘤的调节失调提供策略。
{"title":"REST missense mutations reveal disrupted Re1 motif binding and co-repressor interactions in uterine fibroids.","authors":"Srineevas Sriram, Chandresh Palanichamy, P T Subash, Manshi Kumari Gupta, C Sudandiradoss","doi":"10.3389/fbinf.2025.1703356","DOIUrl":"10.3389/fbinf.2025.1703356","url":null,"abstract":"<p><strong>Introduction: </strong>The Re1-Silencing Transcription Factor (REST) is a master regulator of gene silencing, orchestrating transcriptional repression by tethering chromatin-modifying co-repressors to the Re1 motif of target genes. While REST is recognized as a sentinel of cellular identity, its role in uterine fibroids (UF) remains unclear. This study aims to investigate how structural perturbations in REST may compromise its regulatory function and contribute to altered transcriptional control in fibroid biology.</p><p><strong>Methods: </strong>A deep structural interrogation of REST was performed through expansive <i>in silico</i> analysis of 938 missense SNPs. Evolutionary conservation was assessed across ten primate species to identify structurally disruptive variants. Structural modelling, protein-protein and protein-DNA docking analyses were conducted to evaluate interactions with co-repressors and DNA. Molecular dynamics simulations were used to assess conformational stability, flexibility, compactness, and energetic changes in wild-type and mutant REST variants.</p><p><strong>Results: </strong>Five structurally disruptive REST variants (Y31C, Y31D, L76Q, Y283C, L427Q) were identified at evolutionarily conserved residues. Structural modelling and docking analyses revealed weakened affinity for co-repressors, with the Y283C variant showing a marked reduction in SIN3A interaction (Z-score: 2.4 to -1.2) and impaired DNA binding (Z-score: 2.0 to -1.3). Molecular dynamics simulations demonstrated that Y283C increased rigidity (RMSF: 0.33 to 0.27 nm), reduced compactness (Rg: 3.48-3.51 nm), and lowered potential energy. Upon Re1 binding, destabilization intensified, with increased RMSD (0.95-1.07 nm) and pronounced shifts in energy.</p><p><strong>Discussion: </strong>This integrative analysis highlights REST as a candidate regulatory component in uterine fibroid biology. Structural disruption of REST, particularly through the Y283C mutation, may destabilize molecular interactions and compromise DNA-binding precision, potentially unleashing transcriptional noise that fuels fibroid growth. These findings suggest that perturbation of REST-mediated transcriptional repression may be associated with altered regulatory control in this disease and could inform future strategies to investigate dysregulation in uterine fibroids.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1703356"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12832642/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIT kinase phosphorylation as significant regulatory node for cellular checkpoints. CIT激酶磷酸化是细胞检查点的重要调控节点。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-12 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1734030
Jaytha Thomas, Fathimathul Lubaba, Mukhtar Ahmed, Althaf Mahin, Levin John, Athira Perunelly Gopalakrishnan, Suhail Subair, Prathik Basthikoppa Shivamurthy, Rajesh Raju, Sowmya Soman

Introduction: Citron Rho-interacting serine/threonine kinase (CIT) is a major cytosolic protein kinase essential for midbody organisation, abscission, and cytokinesis. Dysregulation and mutations in CIT are associated with multiple cancers and neurodevelopmental disorders, including microcephaly. Although global phosphoproteomic studies have identified more than 50 phosphosites in CIT, their functional relevance and the kinases regulating them remain largely unexplored.

Methods: To systematically investigate the phosphoregulation of CIT, we curated and integrated global phosphoproteomic datasets, along with their associated experimental conditions, to comprehensively catalogue phosphorylation events reported for CIT. To assess the functional significance of CIT, we examined proteins that were differentially co-regulated with its predominant phosphosite.

Results: Serine 440 (S440), located outside the kinase domain (representing over 55% of CIT-associated phospho-signalling events across 100 experimental conditions, including Enterovirus A71 infection, metformin, and interleukin-33), was identified as its predominant phosphosite. Motif analysis revealed the presence of a D(S/T)P/P(S/T)D motif recognised by the CIT kinase domain, suggesting S440 as a predicted autophosphorylation site. Co-phosphoregulation analysis identified 136 interacting proteins and 82 predicted substrates that were positively co-regulated with CIT_S440. The resulting phospho-regulatory network comprised essential cell cycle and DNA repair regulators, including MDC1 and TRIP12. Significantly, over 120 co-regulated phosphosites were functionally linked to DNA repair and cell cycle regulation. Aberrant phosphorylation of CIT_S440 observed across cancers of the breast, colon, and bladder suggests CIT_S440 as a potential onco-phosphosite critically involved in cellular checkpoint signalling.

Discussion: These findings suggest that CIT_S440 functions as a promising therapeutic target, and the phosphosite-centric regulatory network derived in this study could serve as a platform to evaluate its phosphosite-specific therapeutic interventions.

香橼rro相互作用丝氨酸/苏氨酸激酶(CIT)是一种主要的细胞质蛋白激酶,对中间体组织、脱落和细胞分裂至关重要。CIT的失调和突变与多种癌症和包括小头畸形在内的神经发育障碍有关。尽管全球磷酸化蛋白质组学研究已经确定了50多个CIT磷酸化位点,但它们的功能相关性和调节它们的激酶在很大程度上仍未被探索。方法:为了系统地研究CIT的磷酸化调控,我们整理并整合了全球磷酸化蛋白质组学数据集,以及相关的实验条件,全面分类了CIT的磷酸化事件。为了评估CIT的功能意义,我们研究了与其主要磷酸化位点差异共调控的蛋白质。结果:丝氨酸440 (S440)位于激酶结构域外(在100种实验条件下,包括肠病毒A71感染、二甲双胍和白细胞介素-33,代表超过55%的cit相关磷酸化信号事件),被确定为其主要的磷酸化位点。基序分析显示,CIT激酶结构域识别的D(S/T)P/P(S/T)D基序存在,提示S440是预测的自磷酸化位点。共磷酸化调控分析鉴定出136个相互作用蛋白和82个预测底物与CIT_S440正共调控。由此产生的磷酸化调控网络包括必需的细胞周期和DNA修复调控因子,包括MDC1和TRIP12。值得注意的是,超过120个共调节的磷酸化位点与DNA修复和细胞周期调节有功能联系。在乳腺癌、结肠癌和膀胱癌中观察到的CIT_S440的异常磷酸化表明,CIT_S440是一种潜在的癌磷酸化位点,在细胞检查点信号传导中起关键作用。讨论:这些研究结果表明,CIT_S440是一个有希望的治疗靶点,本研究中得出的以磷位点为中心的调控网络可以作为评估其磷位点特异性治疗干预措施的平台。
{"title":"CIT kinase phosphorylation as significant regulatory node for cellular checkpoints.","authors":"Jaytha Thomas, Fathimathul Lubaba, Mukhtar Ahmed, Althaf Mahin, Levin John, Athira Perunelly Gopalakrishnan, Suhail Subair, Prathik Basthikoppa Shivamurthy, Rajesh Raju, Sowmya Soman","doi":"10.3389/fbinf.2025.1734030","DOIUrl":"10.3389/fbinf.2025.1734030","url":null,"abstract":"<p><strong>Introduction: </strong>Citron Rho-interacting serine/threonine kinase (CIT) is a major cytosolic protein kinase essential for midbody organisation, abscission, and cytokinesis. Dysregulation and mutations in CIT are associated with multiple cancers and neurodevelopmental disorders, including microcephaly. Although global phosphoproteomic studies have identified more than 50 phosphosites in CIT, their functional relevance and the kinases regulating them remain largely unexplored.</p><p><strong>Methods: </strong>To systematically investigate the phosphoregulation of CIT, we curated and integrated global phosphoproteomic datasets, along with their associated experimental conditions, to comprehensively catalogue phosphorylation events reported for CIT. To assess the functional significance of CIT, we examined proteins that were differentially co-regulated with its predominant phosphosite.</p><p><strong>Results: </strong>Serine 440 (S440), located outside the kinase domain (representing over 55% of CIT-associated phospho-signalling events across 100 experimental conditions, including Enterovirus A71 infection, metformin, and interleukin-33), was identified as its predominant phosphosite. Motif analysis revealed the presence of a D(S/T)P/P(S/T)D motif recognised by the CIT kinase domain, suggesting S440 as a predicted autophosphorylation site. Co-phosphoregulation analysis identified 136 interacting proteins and 82 predicted substrates that were positively co-regulated with CIT_S440. The resulting phospho-regulatory network comprised essential cell cycle and DNA repair regulators, including MDC1 and TRIP12. Significantly, over 120 co-regulated phosphosites were functionally linked to DNA repair and cell cycle regulation. Aberrant phosphorylation of CIT_S440 observed across cancers of the breast, colon, and bladder suggests CIT_S440 as a potential onco-phosphosite critically involved in cellular checkpoint signalling.</p><p><strong>Discussion: </strong>These findings suggest that CIT_S440 functions as a promising therapeutic target, and the phosphosite-centric regulatory network derived in this study could serve as a platform to evaluate its phosphosite-specific therapeutic interventions.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1734030"},"PeriodicalIF":3.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12833521/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146069104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genetic risk predictions using deep learning models with summary data. 使用汇总数据的深度学习模型进行遗传风险预测。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2026-01-08 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1657021
Angela Wang, Elena Xiao, Jason Cheng, Xiaoxi Shen

Background: As a driving force of the Fourth Industrial Revolution, deep learning methods have achieved significant success across various fields, including genetic and genomic studies. While individual-level genetic data is ideal for deep learning models, privacy concerns and data-sharing restrictions often limit its availability to researchers.

Methods: In this paper, we investigated the potential applications of deep learning models-including deep neural networks, convolutional neural networks, recurrent neural networks, and transformers-when only genetic summary data, such as linkage disequilibrium matrices, is available. The bootstrap method was used to approximate the test error. Simulation studies and real data analyses were conducted to compare the performance of deep learning methods in genetic risk prediction using individual-level genetic data versus genetic summary data.

Results: The test mean squared errors (MSEs) of most applied deep learning models are comparable when using individual-level data versus summary data.

Conclusion: Our results suggest that suitable deep learning methods could also serve as an alternative approach to predict disease related traits when only linkage disequilibrium matrices are available as input.

背景:作为第四次工业革命的推动力,深度学习方法在包括基因和基因组研究在内的各个领域取得了重大成功。虽然个人层面的基因数据是深度学习模型的理想选择,但隐私问题和数据共享限制往往限制了研究人员对其的可用性。方法:在本文中,我们研究了深度学习模型(包括深度神经网络、卷积神经网络、循环神经网络和变换)在只有遗传汇总数据(如链接不平衡矩阵)可用时的潜在应用。采用自举法逼近测试误差。通过模拟研究和真实数据分析,比较了深度学习方法在使用个体水平遗传数据和遗传汇总数据进行遗传风险预测方面的性能。结果:大多数应用深度学习模型的检验均方误差(MSEs)在使用个人水平数据与汇总数据时是可比较的。结论:我们的研究结果表明,当只有连锁不平衡矩阵可用作为输入时,合适的深度学习方法也可以作为预测疾病相关性状的替代方法。
{"title":"Genetic risk predictions using deep learning models with summary data.","authors":"Angela Wang, Elena Xiao, Jason Cheng, Xiaoxi Shen","doi":"10.3389/fbinf.2025.1657021","DOIUrl":"10.3389/fbinf.2025.1657021","url":null,"abstract":"<p><strong>Background: </strong>As a driving force of the Fourth Industrial Revolution, deep learning methods have achieved significant success across various fields, including genetic and genomic studies. While individual-level genetic data is ideal for deep learning models, privacy concerns and data-sharing restrictions often limit its availability to researchers.</p><p><strong>Methods: </strong>In this paper, we investigated the potential applications of deep learning models-including deep neural networks, convolutional neural networks, recurrent neural networks, and transformers-when only genetic summary data, such as linkage disequilibrium matrices, is available. The bootstrap method was used to approximate the test error. Simulation studies and real data analyses were conducted to compare the performance of deep learning methods in genetic risk prediction using individual-level genetic data versus genetic summary data.</p><p><strong>Results: </strong>The test mean squared errors (MSEs) of most applied deep learning models are comparable when using individual-level data versus summary data.</p><p><strong>Conclusion: </strong>Our results suggest that suitable deep learning methods could also serve as an alternative approach to predict disease related traits when only linkage disequilibrium matrices are available as input.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1657021"},"PeriodicalIF":3.9,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12823927/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146054895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1