首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
BC-predict: mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis. BC-predict:挖掘信号生物标志物,建立早期乳腺癌亚型和预后模型。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-18 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1644695
Sangeetha Muthamilselvan, Natarajan Vaithilingam, Ashok Palaniappan

Introduction: Disease heterogeneity is the hallmark of breast cancer, which is the most common female malignancy. With a disturbing increase in mortality and disease burden, there remains a need for effective early-stage theragnostic and prognostic biomarkers. In this work, we improved on BrcaDx (https://apalania.shinyapps.io/brcadx/) for cancer vs control screening and examined a cluster of adjoining learning problems in breast cancer heterogeneity: (i) identification of metastatic cancers; (ii) molecular subtyping (TNBC, HER2, or luminal); and (iii) histological subtyping (invasive ductal or invasive lobular).

Methods: We analyzed the transcriptomic profiles of breast cancer patients from public-domain databases such as the TCGA using stage-encoded problem-specific statistical models of gene expression and unveiled stage-salient and progression-significant genes. Using a consensus approach, we identified potential machine learning features, and considered six model classes for each learning problem, with hyperparameter optimization on a training dataset and evaluation on a holdout test dataset. A nested approach enabled us to identify the best model class for each learning problem.

Results: External validation of the best models yielded balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping. The model for molecular subtyping was validated on a 26-sample TNBC-only out-of-distribution cohort, yielding 25 correct predictions. We performed a late integration of multi-omics datasets by validating the feature space used in each problem with miRNA profiles, methylation profiles, and commercial breast cancer panels.

Discussion: Pending prospective studies, we have translated the models into BC-Predict that forks the best models developed for each problem in a unified interface and provides a complete readout for input instances of expression data, including uncertainty estimates. BC-Predict is freely available for non-commercial purposes at: https://apalania.shinyapps.io/BC-Predict.

乳腺癌是最常见的女性恶性肿瘤,疾病异质性是其特征。随着死亡率和疾病负担的令人不安的增加,仍然需要有效的早期诊断和预后生物标志物。在这项工作中,我们改进了BrcaDx (https://apalania.shinyapps.io/brcadx/)用于癌症与对照筛查,并检查了乳腺癌异质性中一系列相邻的学习问题:(i)转移性癌症的识别;(ii)分子分型(TNBC、HER2或luminal);组织学分型(浸润性导管或浸润性小叶)。方法:我们使用分期编码的问题特异性基因表达统计模型,从公共领域数据库(如TCGA)中分析乳腺癌患者的转录组谱,并揭示分期显著性和进展显著性基因。使用共识方法,我们确定了潜在的机器学习特征,并为每个学习问题考虑了六个模型类,在训练数据集上进行了超参数优化,并在holdout测试数据集上进行了评估。嵌套方法使我们能够为每个学习问题确定最佳的模型类。结果:最佳模型的外部验证获得了97.42%的癌症与正常的平衡精度;转移vs非转移率为88.22%;三元分子分型占88.79%;组织学分型的集合准确率为94.23%。分子分型模型在26个样本中进行了验证,得到了25个正确的预测。我们通过验证miRNA图谱、甲基化图谱和商业乳腺癌小组在每个问题中使用的特征空间,进行了多组学数据集的后期整合。讨论:在进行前瞻性研究之前,我们已经将模型翻译成BC-Predict,该模型在统一的界面中为每个问题开发了最佳模型,并为表达式数据的输入实例提供了完整的读数,包括不确定性估计。BC-Predict免费用于非商业目的:https://apalania.shinyapps.io/BC-Predict。
{"title":"BC-predict: mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis.","authors":"Sangeetha Muthamilselvan, Natarajan Vaithilingam, Ashok Palaniappan","doi":"10.3389/fbinf.2025.1644695","DOIUrl":"10.3389/fbinf.2025.1644695","url":null,"abstract":"<p><strong>Introduction: </strong>Disease heterogeneity is the hallmark of breast cancer, which is the most common female malignancy. With a disturbing increase in mortality and disease burden, there remains a need for effective early-stage theragnostic and prognostic biomarkers. In this work, we improved on BrcaDx (https://apalania.shinyapps.io/brcadx/) for cancer vs control screening and examined a cluster of adjoining learning problems in breast cancer heterogeneity: (i) identification of metastatic cancers; (ii) molecular subtyping (TNBC, HER2, or luminal); and (iii) histological subtyping (invasive ductal or invasive lobular).</p><p><strong>Methods: </strong>We analyzed the transcriptomic profiles of breast cancer patients from public-domain databases such as the TCGA using stage-encoded problem-specific statistical models of gene expression and unveiled stage-salient and progression-significant genes. Using a consensus approach, we identified potential machine learning features, and considered six model classes for each learning problem, with hyperparameter optimization on a training dataset and evaluation on a holdout test dataset. A nested approach enabled us to identify the best model class for each learning problem.</p><p><strong>Results: </strong>External validation of the best models yielded balanced accuracies of 97.42% for cancer vs normal; 88.22% for metastatic v/s non metastatic; 88.79% for ternary molecular subtyping; and ensemble accuracy of 94.23% for histological subtyping. The model for molecular subtyping was validated on a 26-sample TNBC-only out-of-distribution cohort, yielding 25 correct predictions. We performed a late integration of multi-omics datasets by validating the feature space used in each problem with miRNA profiles, methylation profiles, and commercial breast cancer panels.</p><p><strong>Discussion: </strong>Pending prospective studies, we have translated the models into BC-Predict that forks the best models developed for each problem in a unified interface and provides a complete readout for input instances of expression data, including uncertainty estimates. BC-Predict is freely available for non-commercial purposes at: https://apalania.shinyapps.io/BC-Predict.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1644695"},"PeriodicalIF":3.9,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488574/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Key genes associated with brain metastasis in non-small cell lung cancer: novel insights from bioinformatics analysis. 非小细胞肺癌脑转移相关关键基因:来自生物信息学分析的新见解。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-18 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1625664
Shuang Zhao, He Zhang

Background: This study aims to investigate potential biomarkers associated with NSCLC-BM and elucidate their regulatory roles in critical pathways involved in cerebral metastatic dissemination.

Methods: The identified DEGs were subjected to functional enrichment analysis. PPI networks were predicted using the STRING database and visualized with Cytoscape. Hub genes were subsequently screened from the PPI network to construct a transcription TF-miRNA regulatory network. Subsequent analyses included: survival analysis, immune infiltration assessment and comprehensive mutational profiling.

Results: Among the 56 identified DEGs, 19 were upregulated while 37 were downregulated. GOntology enrichment analysis revealed significant enrichment in immune response, signaling receptor binding, and extracellular region. KEGG pathway analysis demonstrated predominant involvement in cytokine-cytokine receptor interaction and chemokine signaling pathway. Through Cytoscape-based screening, we identified 10 hub genes: CD19, CD27, IL7R, SELL, CCL5, CCR5, PRF1, GZMK, GZMA, and TIGIT. The TF-miRNA regulatory network analysis uncovered 6 transcription factors (STAT5A/B, NFKB1, EGR1, RELA, and CTCF) and 4 miRNAs(hsa-miR-204, hsa-miR-148b, hsa-miR-618, and hsa-miR-103) as critical transcriptional and post-transcriptional regulators of DEGs.Integrated analyses including Kaplan-Meier survival curves, immune infiltration profiling, and comprehensive mutational analysis demonstrated significant associations with brain metastatic progression in the studied cohort.

Conclusion: This study provides novel biomarkers from a unique perspective for the diagnosis, prognosis, and development of molecular-targeted therapies or immunotherapies for brain metastasis in NSCLC.

背景:本研究旨在探讨与NSCLC-BM相关的潜在生物标志物,并阐明其在脑转移传播关键通路中的调节作用。方法:对鉴定的deg进行功能富集分析。使用STRING数据库预测PPI网络,并使用Cytoscape进行可视化。随后从PPI网络中筛选枢纽基因,构建转录TF-miRNA调控网络。随后的分析包括:生存分析、免疫浸润评估和综合突变谱。结果:56个基因中,19个基因表达上调,37个基因表达下调。GOntology富集分析显示免疫应答、信号受体结合和细胞外区显著富集。KEGG通路分析显示主要参与细胞因子-细胞因子受体相互作用和趋化因子信号通路。通过基于cytoscape的筛选,我们确定了10个枢纽基因:CD19、CD27、IL7R、SELL、CCL5、CCR5、PRF1、GZMK、GZMA和TIGIT。TF-miRNA调控网络分析发现6个转录因子(STAT5A/B、NFKB1、EGR1、RELA和CTCF)和4个mirna (hsa-miR-204、hsa-miR-148b、hsa-miR-618和hsa-miR-103)是DEGs的关键转录和转录后调控因子。包括Kaplan-Meier生存曲线、免疫浸润谱和综合突变分析在内的综合分析显示,在研究的队列中,脑转移进展与前列腺癌有显著关联。结论:本研究为非小细胞肺癌脑转移的诊断、预后和分子靶向治疗或免疫治疗的发展提供了独特的视角。
{"title":"Key genes associated with brain metastasis in non-small cell lung cancer: novel insights from bioinformatics analysis.","authors":"Shuang Zhao, He Zhang","doi":"10.3389/fbinf.2025.1625664","DOIUrl":"10.3389/fbinf.2025.1625664","url":null,"abstract":"<p><strong>Background: </strong>This study aims to investigate potential biomarkers associated with NSCLC-BM and elucidate their regulatory roles in critical pathways involved in cerebral metastatic dissemination.</p><p><strong>Methods: </strong>The identified DEGs were subjected to functional enrichment analysis. PPI networks were predicted using the STRING database and visualized with Cytoscape. Hub genes were subsequently screened from the PPI network to construct a transcription TF-miRNA regulatory network. Subsequent analyses included: survival analysis, immune infiltration assessment and comprehensive mutational profiling.</p><p><strong>Results: </strong>Among the 56 identified DEGs, 19 were upregulated while 37 were downregulated. GOntology enrichment analysis revealed significant enrichment in immune response, signaling receptor binding, and extracellular region. KEGG pathway analysis demonstrated predominant involvement in cytokine-cytokine receptor interaction and chemokine signaling pathway. Through Cytoscape-based screening, we identified 10 hub genes: CD19, CD27, IL7R, SELL, CCL5, CCR5, PRF1, GZMK, GZMA, and TIGIT. The TF-miRNA regulatory network analysis uncovered 6 transcription factors (STAT5A/B, NFKB1, EGR1, RELA, and CTCF) and 4 miRNAs(hsa-miR-204, hsa-miR-148b, hsa-miR-618, and hsa-miR-103) as critical transcriptional and post-transcriptional regulators of DEGs.Integrated analyses including Kaplan-Meier survival curves, immune infiltration profiling, and comprehensive mutational analysis demonstrated significant associations with brain metastatic progression in the studied cohort.</p><p><strong>Conclusion: </strong>This study provides novel biomarkers from a unique perspective for the diagnosis, prognosis, and development of molecular-targeted therapies or immunotherapies for brain metastasis in NSCLC.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1625664"},"PeriodicalIF":3.9,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488587/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug discovery for chemotherapeutic resistance based on pathway-responsive gene sets and its application in breast cancer. 基于通路反应基因组的化疗耐药药物发现及其在乳腺癌中的应用。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-16 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1661601
Dehua Feng, Jingwen Hao, Lingxu Li, Jian Chen, Xinying Liu, Ruijie Zhang, Huirui Han, Tianyi Li, Xuefeng Wang, Xia Li, Lei Yu, Bing Li, Jin Li, Limei Wang

Introduction: Chemotherapy response variability in cancer patients necessitates novel strategies targeting chemoresistant populations. While combinatorial regimens show promise through synergistic pharmacological interactions, traditional pathway enrichment methods relying on static gene sets fail to capture drug-induced dynamic transcriptional perturbations.

Methods: To address this challenge, we developed the Pathway-Responsive Gene Sets (PRGS) framework to systematically identify chemoresistance-associated pathways and guide therapeutic intervention. Comparative evaluation of three computational strategies (GSEA-like method, Hypergeometric test-based method, Bates test-based method) revealed that the GSEA-like methodology exhibited superior performance, enabling precise identification of drug-induced pathway dysregulation.

Results: Key experimental findings demonstrated PRGS's superiority over conventional Pathway Member Gene Sets (PMGS), exhibiting statistical independence (p < 0.0001) and enhanced detection of chemotherapy-driven pathway dysregulation. Application of PRGS to the GDSC dataset identified 8 resistance-associated pathways. Screening of agents targeting these pathways yielded candidates with predicted anti-resistance activity. An in vitro cellular experiment demonstrated that the bortezomib-bleomycin combination exhibited synergistic cytotoxicity (IDAcomboScore = 0.014) in T47D cells, highlighting the potential of PRGS-guided therapeutic strategies.

Discussion: This study establishes a PRGS-based methodological framework that integrates genomic perturbations with precision oncology, demonstrating its capacity to decode resistance mechanisms and guide therapeutic development through dynamic pathway analysis.

导读:癌症患者化疗反应的变异性需要针对化疗耐药人群的新策略。虽然组合方案通过协同药理相互作用显示出希望,但依赖静态基因集的传统途径富集方法无法捕获药物诱导的动态转录扰动。方法:为了应对这一挑战,我们开发了通路反应基因集(PRGS)框架,以系统地识别化学耐药相关通路并指导治疗干预。对三种计算策略(GSEA-like方法、Hypergeometric test-based方法、Bates test-based方法)的比较评价表明,GSEA-like方法表现出优越的性能,能够精确识别药物诱导的通路失调。结果:关键实验结果表明,PRGS优于传统的Pathway Member Gene Sets (PMGS),具有统计学独立性(p < 0.0001),并且增强了对化疗驱动的通路失调的检测。将PRGS应用于GDSC数据集,确定了8种与抗性相关的途径。筛选靶向这些途径的药物产生了具有预测抗抗性活性的候选药物。体外细胞实验表明,硼替佐米-博来霉素联合用药对T47D细胞具有协同细胞毒性(IDAcomboScore = 0.014),这凸显了prgs引导的治疗策略的潜力。讨论:本研究建立了一个基于prgs的方法框架,将基因组扰动与精确肿瘤学相结合,展示了其解码耐药机制的能力,并通过动态途径分析指导治疗开发。
{"title":"Drug discovery for chemotherapeutic resistance based on pathway-responsive gene sets and its application in breast cancer.","authors":"Dehua Feng, Jingwen Hao, Lingxu Li, Jian Chen, Xinying Liu, Ruijie Zhang, Huirui Han, Tianyi Li, Xuefeng Wang, Xia Li, Lei Yu, Bing Li, Jin Li, Limei Wang","doi":"10.3389/fbinf.2025.1661601","DOIUrl":"10.3389/fbinf.2025.1661601","url":null,"abstract":"<p><strong>Introduction: </strong>Chemotherapy response variability in cancer patients necessitates novel strategies targeting chemoresistant populations. While combinatorial regimens show promise through synergistic pharmacological interactions, traditional pathway enrichment methods relying on static gene sets fail to capture drug-induced dynamic transcriptional perturbations.</p><p><strong>Methods: </strong>To address this challenge, we developed the Pathway-Responsive Gene Sets (PRGS) framework to systematically identify chemoresistance-associated pathways and guide therapeutic intervention. Comparative evaluation of three computational strategies (GSEA-like method, Hypergeometric test-based method, Bates test-based method) revealed that the GSEA-like methodology exhibited superior performance, enabling precise identification of drug-induced pathway dysregulation.</p><p><strong>Results: </strong>Key experimental findings demonstrated PRGS's superiority over conventional Pathway Member Gene Sets (PMGS), exhibiting statistical independence (<i>p</i> < 0.0001) and enhanced detection of chemotherapy-driven pathway dysregulation. Application of PRGS to the GDSC dataset identified 8 resistance-associated pathways. Screening of agents targeting these pathways yielded candidates with predicted anti-resistance activity. An <i>in vitro</i> cellular experiment demonstrated that the bortezomib-bleomycin combination exhibited synergistic cytotoxicity (IDAcomboScore = 0.014) in T47D cells, highlighting the potential of PRGS-guided therapeutic strategies.</p><p><strong>Discussion: </strong>This study establishes a PRGS-based methodological framework that integrates genomic perturbations with precision oncology, demonstrating its capacity to decode resistance mechanisms and guide therapeutic development through dynamic pathway analysis.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1661601"},"PeriodicalIF":3.9,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12479470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering biomarkers for chronic sinusitis with nasal polyps: a study integrating bioinformatics analysis and experimental validation of macrophage polarization and metabolism-related genes. 发现慢性鼻窦炎伴鼻息肉的生物标志物:巨噬细胞极化和代谢相关基因的生物信息学分析与实验验证相结合的研究
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-15 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1613136
Juan Zhou, Huan Wang, Jin Wang, Fuming Zhou

Background: Macrophages play a critical role in chronic rhinosinusitis with nasal polyps (CRSwNP), and their functional imbalance may cause metabolic disturbances. However, the mechanisms of their role in CRSwNP remain unclear. This study aimed to identify CRSwNP biomarkers related to macrophage polarization and metabolism, and elucidate their molecular regulatory mechanisms.

Methods: In this study, transcriptomic data of chronic rhinosinusitis with nasal polyps (CRSwNP) were obtained from public databases. Differentially expressed genes (DEGs) were screened via differential expression analysis. Subsequently, weighted gene co-expression network analysis (WGCNA) was used to identify key module genes related to macrophage polarization-related genes (MP-RGs), which were then cross-referenced with metabolism-related genes to screen for candidate genes. After that, two machine learning methods-least absolute shrinkage and selection operator (LASSO) and random forest (RF)-were applied to further screen these candidate genes. Receiver operating characteristic (ROC) curves for the training set and validation set were constructed, and gene expression validation was conducted to finally determine the biomarkers. Finally, reverse transcription-quantitative polymerase chain reaction (RT-qPCR) was used to verify the expression levels of prognostic genes.

Results: ALOX5, HMOX1, and PLA2G7 were identified as biomarkers for CRSwNP, with AUC >0.7 in both training and validation sets, showing strong diagnostic potential. A nomogram, built on these three biomarkers, exhibited superior diagnostic performance. Enrichment analysis suggested that these biomarkers might be implicated in immune pathways. Furthermore, all three biomarkers were found to be correlated with asthma. Selenium was identified as a co-target of ALOX5 and HMOX1, presenting potential therapeutic targets for CRSwNP. A total of 10 key miRNAs regulating these biomarkers were identified, and the upstream long non-coding RNAs of hsa-miR-642a-5p, including FOXC1 and NEAT1, were predicted. Additionally, the transcription factor FOXC1 was found to concurrently regulate all three biomarkers. RT-qPCR results validated that the expression levels of ALOX5, HMOX1, and PLA2G7 were significantly elevated in CRSwNP patients, corroborating the findings from bioinformatics analyses.

Conclusion: ALOX5, HMOX1, and PLA2G7 were identified as biomarkers linked to macrophage polarization and metabolism in CRSwNP. These findings offer new insights for early prevention strategies and clinical drug development in CRSwNP.

背景:巨噬细胞在慢性鼻窦炎伴鼻息肉(CRSwNP)中起关键作用,其功能失衡可能导致代谢紊乱。然而,它们在CRSwNP中的作用机制尚不清楚。本研究旨在鉴定巨噬细胞极化和代谢相关的CRSwNP生物标志物,并阐明其分子调控机制。方法:本研究从公共数据库中获取慢性鼻窦炎伴鼻息肉(CRSwNP)的转录组学数据。通过差异表达分析筛选差异表达基因(DEGs)。随后,采用加权基因共表达网络分析(WGCNA)鉴定巨噬细胞极化相关基因(MP-RGs)相关关键模块基因,并与代谢相关基因进行交叉比对,筛选候选基因。然后,应用最小绝对收缩和选择算子(LASSO)和随机森林(RF)两种机器学习方法进一步筛选这些候选基因。构建训练集和验证集的受试者工作特征(ROC)曲线,并进行基因表达验证,最终确定生物标志物。最后,采用逆转录-定量聚合酶链反应(RT-qPCR)验证预后基因的表达水平。结果:ALOX5、HMOX1和PLA2G7被鉴定为CRSwNP的生物标志物,训练集和验证集的AUC均为>.7,具有较强的诊断潜力。建立在这三种生物标记物上的nomogram,表现出优越的诊断性能。富集分析表明,这些生物标志物可能与免疫途径有关。此外,这三种生物标志物都被发现与哮喘相关。硒被鉴定为ALOX5和HMOX1的共同靶点,为CRSwNP提供了潜在的治疗靶点。共鉴定了10个调节这些生物标志物的关键mirna,并预测了hsa-miR-642a-5p的上游长链非编码rna,包括FOXC1和NEAT1。此外,转录因子FOXC1被发现同时调节这三种生物标志物。RT-qPCR结果证实了ALOX5、HMOX1和PLA2G7在CRSwNP患者中的表达水平显著升高,证实了生物信息学分析的结果。结论:ALOX5、HMOX1和PLA2G7是与CRSwNP中巨噬细胞极化和代谢相关的生物标志物。这些发现为CRSwNP的早期预防策略和临床药物开发提供了新的见解。
{"title":"Discovering biomarkers for chronic sinusitis with nasal polyps: a study integrating bioinformatics analysis and experimental validation of macrophage polarization and metabolism-related genes.","authors":"Juan Zhou, Huan Wang, Jin Wang, Fuming Zhou","doi":"10.3389/fbinf.2025.1613136","DOIUrl":"10.3389/fbinf.2025.1613136","url":null,"abstract":"<p><strong>Background: </strong>Macrophages play a critical role in chronic rhinosinusitis with nasal polyps (CRSwNP), and their functional imbalance may cause metabolic disturbances. However, the mechanisms of their role in CRSwNP remain unclear. This study aimed to identify CRSwNP biomarkers related to macrophage polarization and metabolism, and elucidate their molecular regulatory mechanisms.</p><p><strong>Methods: </strong>In this study, transcriptomic data of chronic rhinosinusitis with nasal polyps (CRSwNP) were obtained from public databases. Differentially expressed genes (DEGs) were screened via differential expression analysis. Subsequently, weighted gene co-expression network analysis (WGCNA) was used to identify key module genes related to macrophage polarization-related genes (MP-RGs), which were then cross-referenced with metabolism-related genes to screen for candidate genes. After that, two machine learning methods-least absolute shrinkage and selection operator (LASSO) and random forest (RF)-were applied to further screen these candidate genes. Receiver operating characteristic (ROC) curves for the training set and validation set were constructed, and gene expression validation was conducted to finally determine the biomarkers. Finally, reverse transcription-quantitative polymerase chain reaction (RT-qPCR) was used to verify the expression levels of prognostic genes.</p><p><strong>Results: </strong>ALOX5, HMOX1, and PLA2G7 were identified as biomarkers for CRSwNP, with AUC >0.7 in both training and validation sets, showing strong diagnostic potential. A nomogram, built on these three biomarkers, exhibited superior diagnostic performance. Enrichment analysis suggested that these biomarkers might be implicated in immune pathways. Furthermore, all three biomarkers were found to be correlated with asthma. Selenium was identified as a co-target of ALOX5 and HMOX1, presenting potential therapeutic targets for CRSwNP. A total of 10 key miRNAs regulating these biomarkers were identified, and the upstream long non-coding RNAs of hsa-miR-642a-5p, including FOXC1 and NEAT1, were predicted. Additionally, the transcription factor FOXC1 was found to concurrently regulate all three biomarkers. RT-qPCR results validated that the expression levels of ALOX5, HMOX1, and PLA2G7 were significantly elevated in CRSwNP patients, corroborating the findings from bioinformatics analyses.</p><p><strong>Conclusion: </strong>ALOX5, HMOX1, and PLA2G7 were identified as biomarkers linked to macrophage polarization and metabolism in CRSwNP. These findings offer new insights for early prevention strategies and clinical drug development in CRSwNP.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1613136"},"PeriodicalIF":3.9,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12477252/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145202063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-cell splicing QTL analysis in pancreatic islets. 胰岛单细胞剪接QTL分析。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-10 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1657895
Jae-Won Cho, Jingyi Cao, Martin Hemberg

Introduction: Alternative splicing (AS) of mRNAs is a highly conserved mechanism which can greatly expand the functional diversity of the transcriptome. Aberrant splicing underpins many diseases, and a better understanding of AS can provide insights regarding the molecular mechanisms involved. Importantly, AS can be affected by genetic variants and several studies have indicated large numbers of splicing quantitative trait loci (sQTL). With the advance of single-cell technology, expression QTL studies have been expanded to identify cell type level variants.

Methods: We collected eight full-length scRNA-seq pancreatic islet datasets. Genotyping for each individual was done by the CTAT pipeline and Streka2. The isoform quantification was done by RSEM. Finally, sQTL was obtained by sQTLseeker2.

Results: As a result, we identified 228 cell type level sQTLs for alpha and beta cells across 152 genes. In particular, our study highlights four variants affecting CDC42, a gene related to cell morphology, which have not been observed from bulk sQTL analysis.

Discussion: Our results provide a proof of concept that it is possible to identify cell type level sQTLs, and we envision that better powered studies will allow us to further uncover the genetic regulation of splicing.

mrna的选择性剪接(AS)是一种高度保守的机制,它可以极大地扩展转录组的功能多样性。异常剪接是许多疾病的基础,更好地了解AS可以提供有关分子机制的见解。重要的是,AS可以受到遗传变异的影响,一些研究已经发现了大量的剪接数量性状位点(sQTL)。随着单细胞技术的进步,表达QTL研究已经扩展到鉴定细胞类型水平的变异。方法:我们收集了8个全长scRNA-seq胰岛数据集。通过CTAT管道和Streka2对每个个体进行基因分型。用RSEM定量分析。最后,通过sQTLseeker2获取sQTL。结果:我们在152个基因中鉴定了228个细胞类型水平的α和β细胞sqtl。特别是,我们的研究强调了影响CDC42(一种与细胞形态相关的基因)的四个变体,这些变体尚未在批量sQTL分析中观察到。讨论:我们的研究结果证明了识别细胞类型水平的sqtl是可能的,我们设想更好的研究将使我们能够进一步揭示剪接的遗传调控。
{"title":"Single-cell splicing QTL analysis in pancreatic islets.","authors":"Jae-Won Cho, Jingyi Cao, Martin Hemberg","doi":"10.3389/fbinf.2025.1657895","DOIUrl":"10.3389/fbinf.2025.1657895","url":null,"abstract":"<p><strong>Introduction: </strong>Alternative splicing (AS) of mRNAs is a highly conserved mechanism which can greatly expand the functional diversity of the transcriptome. Aberrant splicing underpins many diseases, and a better understanding of AS can provide insights regarding the molecular mechanisms involved. Importantly, AS can be affected by genetic variants and several studies have indicated large numbers of splicing quantitative trait loci (sQTL). With the advance of single-cell technology, expression QTL studies have been expanded to identify cell type level variants.</p><p><strong>Methods: </strong>We collected eight full-length scRNA-seq pancreatic islet datasets. Genotyping for each individual was done by the CTAT pipeline and Streka2. The isoform quantification was done by RSEM. Finally, sQTL was obtained by sQTLseeker2.</p><p><strong>Results: </strong>As a result, we identified 228 cell type level sQTLs for alpha and beta cells across 152 genes. In particular, our study highlights four variants affecting CDC42, a gene related to cell morphology, which have not been observed from bulk sQTL analysis.</p><p><strong>Discussion: </strong>Our results provide a proof of concept that it is possible to identify cell type level sQTLs, and we envision that better powered studies will allow us to further uncover the genetic regulation of splicing.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1657895"},"PeriodicalIF":3.9,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12457394/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-based prediction of SARS-CoV-2 variant properties using machine learning on mutational neighborhoods. 基于突变邻域机器学习的SARS-CoV-2变异特性结构预测
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-08 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1634111
Max van den Boom, Erik Schultes, Thomas Hankemeier

This dataset presents a structure-enriched resource of theoretical and empirical SARS-CoV-2 spike receptor-binding domain (RBD) variants, developed under the STAYAHEAD project for pandemic preparedness. It integrates large-scale in silico structure predictions with empirical biophysical measurements. The dataset includes 3,705 single-point Wuhan-Hu-1 RBD variants and 100 higher-order Omicron BA.1/BA.2 variants, annotated with AlphaFold2 and ESMFold metrics and Bio2Byte sequence-based predictors. Structural descriptors-RMSD, TM-score, plDDT, solvent accessibility, hydrophobicity, aggregation propensity-are linked to ACE2 binding and expression data from deep mutational scanning. Provided as a FAIR2 Data Package, it supports structure-function analysis, variant modeling, and responsible reuse in virology, structural biology, and computational protein science. This collaboration was co-funded by the PPP Allowance from Health ∼ Holland, Top Sector Life Sciences and Health, to stimulate public-private partnerships.

该数据集提供了一个结构丰富的理论和经验SARS-CoV-2刺突受体结合域(RBD)变体资源,这些变体是在STAYAHEAD项目下开发的,用于大流行防范。它集成了大规模的硅结构预测与经验生物物理测量。该数据集包括3705个单点武汉-湖-1 RBD变体和100个高阶Omicron BA.1/BA。2个变体,用AlphaFold2和ESMFold指标以及基于Bio2Byte序列的预测因子进行注释。结构描述符- rmsd, TM-score, plDDT,溶剂可及性,疏水性,聚集倾向-与ACE2结合和深度突变扫描的表达数据相关联。作为FAIR2数据包提供,它支持病毒学、结构生物学和计算蛋白质科学中的结构-功能分析、变异建模和负责任的重用。这项合作由荷兰卫生部、顶级部门生命科学和卫生的PPP津贴共同资助,以促进公私伙伴关系。
{"title":"Structure-based prediction of SARS-CoV-2 variant properties using machine learning on mutational neighborhoods.","authors":"Max van den Boom, Erik Schultes, Thomas Hankemeier","doi":"10.3389/fbinf.2025.1634111","DOIUrl":"10.3389/fbinf.2025.1634111","url":null,"abstract":"<p><p>This dataset presents a structure-enriched resource of theoretical and empirical SARS-CoV-2 spike receptor-binding domain (RBD) variants, developed under the STAYAHEAD project for pandemic preparedness. It integrates large-scale <i>in silico</i> structure predictions with empirical biophysical measurements. The dataset includes 3,705 single-point Wuhan-Hu-1 RBD variants and 100 higher-order Omicron BA.1/BA.2 variants, annotated with AlphaFold2 and ESMFold metrics and Bio2Byte sequence-based predictors. Structural descriptors-RMSD, TM-score, plDDT, solvent accessibility, hydrophobicity, aggregation propensity-are linked to ACE2 binding and expression data from deep mutational scanning. Provided as a FAIR<sup>2</sup> Data Package, it supports structure-function analysis, variant modeling, and responsible reuse in virology, structural biology, and computational protein science. This collaboration was co-funded by the PPP Allowance from Health ∼ Holland, Top Sector Life Sciences and Health, to stimulate public-private partnerships.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1634111"},"PeriodicalIF":3.9,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12452091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PharmacoForge: pharmacophore generation with diffusion models. PharmacoForge:药效团生成与扩散模型。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-08 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1628800
Emma L Flynn, Riya Shah, Ian Dunn, Rishal Aggarwal, David Ryan Koes

Structure-based drug design (SBDD) is enhanced by machine learning (ML) to improve both virtual screening and de novo design. Despite advances in ML tools for both strategies, screening remains bounded by time and computational cost, while generative models frequently produce invalid and synthetically inaccessible molecules. Screening time can be improved with pharmacophore search, which quickly identifies ligands in a database that match a pharmacophore query. In this work, we introduce PharmacoForge, a diffusion model for generating 3D pharmacophores conditioned on a protein pocket. Generated pharmacophore queries identify ligands that are guaranteed to be valid, commercially available molecules. We evaluate PharmacoForge against automated pharmacophore generation methods using the LIT-PCBA benchmark and ligand generative models through a docking-based evaluation framework. We further assess pharmacophore quality through a retrospective screening of the DUD-E dataset. PharmacoForge surpasses other pharmacophore generation methods in the LIT-PCBA benchmark, and resulting ligands from pharmacophore queries performed similarly to de novo generated ligands when docking to DUD-E targets and had lower strain energies compared to de novo generated ligands.

基于结构的药物设计(SBDD)通过机器学习(ML)增强,以改进虚拟筛选和从头设计。尽管这两种策略的ML工具都取得了进步,但筛选仍然受到时间和计算成本的限制,而生成模型经常产生无效和合成不可接近的分子。通过药效团搜索,可以快速识别数据库中与药效团查询匹配的配体,从而提高筛选时间。在这项工作中,我们介绍了PharmacoForge,这是一种用于生成基于蛋白质口袋的3D药效团的扩散模型。生成的药效团查询识别保证有效的配体,商业上可用的分子。我们通过基于对接的评估框架,使用LIT-PCBA基准和配体生成模型,对PharmacoForge与自动药效团生成方法进行了评估。我们通过对ddu - e数据集的回顾性筛选进一步评估药效团质量。在lite - pcba基准测试中,PharmacoForge超越了其他药效团生成方法,从药效团查询得到的配体在对接到ddu - e靶标时的表现与从头生成的配体相似,并且与从头生成的配体相比具有更低的应变能。
{"title":"PharmacoForge: pharmacophore generation with diffusion models.","authors":"Emma L Flynn, Riya Shah, Ian Dunn, Rishal Aggarwal, David Ryan Koes","doi":"10.3389/fbinf.2025.1628800","DOIUrl":"10.3389/fbinf.2025.1628800","url":null,"abstract":"<p><p>Structure-based drug design (SBDD) is enhanced by machine learning (ML) to improve both virtual screening and <i>de novo</i> design. Despite advances in ML tools for both strategies, screening remains bounded by time and computational cost, while generative models frequently produce invalid and synthetically inaccessible molecules. Screening time can be improved with pharmacophore search, which quickly identifies ligands in a database that match a pharmacophore query. In this work, we introduce PharmacoForge, a diffusion model for generating 3D pharmacophores conditioned on a protein pocket. Generated pharmacophore queries identify ligands that are guaranteed to be valid, commercially available molecules. We evaluate PharmacoForge against automated pharmacophore generation methods using the LIT-PCBA benchmark and ligand generative models through a docking-based evaluation framework. We further assess pharmacophore quality through a retrospective screening of the DUD-E dataset. PharmacoForge surpasses other pharmacophore generation methods in the LIT-PCBA benchmark, and resulting ligands from pharmacophore queries performed similarly to <i>de novo</i> generated ligands when docking to DUD-E targets and had lower strain energies compared to <i>de novo</i> generated ligands.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1628800"},"PeriodicalIF":3.9,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12451294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An image analysis pipeline to quantify the spatial distribution of cell markers in stroma-rich tumors. 一种图像分析管道,用于量化富基质肿瘤中细胞标记物的空间分布。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-05 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1619790
Antoine A Ruzette, Nina Kozlova, Kayla A Cruz, Taru Muranen, Simon F Nørrelykke

Aggressive cancers, such as pancreatic ductal adenocarcinoma (PDAC), are often characterized by a complex and desmoplastic tumor microenvironment, a stroma rich supportive connective tissue composed primarily of extracellular matrix (ECM) and non-cancerous cells. Desmoplasia, a dense deposition of stroma, is a major reason for therapy resistance, acting both as a physical barrier that interferes with drug penetration and as a supportive niche that protects cancer cells through diverse mechanisms. Precise understanding of spatial cell interactions in stroma-rich tumors is essential for optimizing therapeutic responses. It enables detailed mapping of stromal-tumor interfaces, comprehensive cell phenotyping, and insights into changes in tissue architecture, improving assessment of drug responses. Recent advances in multiplexed immunofluorescence imaging have enabled the acquisition of large batches of whole-slide tumor images, but scalable and reproducible methods to analyze the spatial distribution of cell states relative to stromal regions remain limited. To address this gap, we developed an open-source computational pipeline that integrates QuPath, StarDist, and custom Python scripts to quantify biomarker expression at a single- and sub-cellular resolution across entire tumor sections. Our workflow includes: (i) automated nuclei segmentation using StarDist, (ii) machine learning-based cell classification using multiplexed marker expression, (iii) modeling of stromal regions based on fibronectin staining, (iv) sensitivity analyses on classification thresholds to ensure robustness across heterogeneous datasets, and (v) distance-based quantification of the proximity of each cell to the stromal border. To improve consistency across slides with variable staining intensities, we introduce a statistical strategy that translates classification thresholds by propagating a chosen reference percentile across the distribution of marker-related cell measurement in each image. We apply this approach to quantify spatial patterns of distribution of the phosphorylated form of the N-Myc downregulated gene 1 (NDRG1), a novel DNA repair protein that conveys signals from the ECM to the nucleus to maintain replication fork homeostasis, and a known cell proliferation marker Ki67 in fibronectin-defined stromal regions in PDAC xenografts. The pipeline is applicable for the analysis of markers of interest in stroma-rich tissues and is publicly available.

侵袭性癌症,如胰腺导管腺癌(PDAC),通常以复杂的肿瘤微环境、富含基质的支持性结缔组织(主要由细胞外基质(ECM)和非癌细胞组成)为特征。结缔组织增生是一种致密的间质沉积,是治疗耐药的主要原因,它既作为干扰药物渗透的物理屏障,又作为通过多种机制保护癌细胞的支持生态位。精确理解富基质肿瘤中空间细胞相互作用对于优化治疗反应至关重要。它可以详细绘制基质肿瘤界面,全面的细胞表型,洞察组织结构的变化,改进药物反应的评估。最近在多路免疫荧光成像方面的进展使得能够获得大批量的全片肿瘤图像,但是分析细胞状态相对于基质区域的空间分布的可扩展和可重复的方法仍然有限。为了解决这一问题,我们开发了一个开源计算管道,集成了QuPath、StarDist和自定义Python脚本,以整个肿瘤切片的单细胞和亚细胞分辨率量化生物标志物的表达。我们的工作流程包括:(i)使用StarDist自动分割细胞核,(ii)使用多路标记表达的基于机器学习的细胞分类,(iii)基于纤维连接蛋白染色的基质区域建模,(iv)分类阈值的敏感性分析,以确保跨异构数据集的鲁棒性,以及(v)基于距离的每个细胞接近基质边界的量化。为了提高不同染色强度的载玻片的一致性,我们引入了一种统计策略,通过在每个图像中与标记相关的细胞测量分布中传播选择的参考百分位数来翻译分类阈值。我们应用这种方法量化了N-Myc下调基因1 (NDRG1)的磷酸化形式的空间分布模式,NDRG1是一种新的DNA修复蛋白,它将信号从ECM传递到细胞核以维持复制叉的稳态,并且在PDAC异种移植物中纤维连接蛋白定义的基质区域中已知的细胞增殖标记Ki67。该管道适用于富基质组织中感兴趣的标记物的分析,并且是公开可用的。
{"title":"An image analysis pipeline to quantify the spatial distribution of cell markers in stroma-rich tumors.","authors":"Antoine A Ruzette, Nina Kozlova, Kayla A Cruz, Taru Muranen, Simon F Nørrelykke","doi":"10.3389/fbinf.2025.1619790","DOIUrl":"10.3389/fbinf.2025.1619790","url":null,"abstract":"<p><p>Aggressive cancers, such as pancreatic ductal adenocarcinoma (PDAC), are often characterized by a complex and desmoplastic tumor microenvironment, a stroma rich supportive connective tissue composed primarily of extracellular matrix (ECM) and non-cancerous cells. Desmoplasia, a dense deposition of stroma, is a major reason for therapy resistance, acting both as a physical barrier that interferes with drug penetration and as a supportive niche that protects cancer cells through diverse mechanisms. Precise understanding of spatial cell interactions in stroma-rich tumors is essential for optimizing therapeutic responses. It enables detailed mapping of stromal-tumor interfaces, comprehensive cell phenotyping, and insights into changes in tissue architecture, improving assessment of drug responses. Recent advances in multiplexed immunofluorescence imaging have enabled the acquisition of large batches of whole-slide tumor images, but scalable and reproducible methods to analyze the spatial distribution of cell states relative to stromal regions remain limited. To address this gap, we developed an open-source computational pipeline that integrates QuPath, StarDist, and custom Python scripts to quantify biomarker expression at a single- and sub-cellular resolution across entire tumor sections. Our workflow includes: (i) automated nuclei segmentation using StarDist, (ii) machine learning-based cell classification using multiplexed marker expression, (iii) modeling of stromal regions based on fibronectin staining, (iv) sensitivity analyses on classification thresholds to ensure robustness across heterogeneous datasets, and (v) distance-based quantification of the proximity of each cell to the stromal border. To improve consistency across slides with variable staining intensities, we introduce a statistical strategy that translates classification thresholds by propagating a chosen reference percentile across the distribution of marker-related cell measurement in each image. We apply this approach to quantify spatial patterns of distribution of the phosphorylated form of the N-Myc downregulated gene 1 (NDRG1), a novel DNA repair protein that conveys signals from the ECM to the nucleus to maintain replication fork homeostasis, and a known cell proliferation marker Ki67 in fibronectin-defined stromal regions in PDAC xenografts. The pipeline is applicable for the analysis of markers of interest in stroma-rich tissues and is publicly available.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1619790"},"PeriodicalIF":3.9,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12446346/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TCRscape: a single-cell multi-omic TCR profiling toolkit. TCRscape:单细胞多组TCR分析工具包。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-05 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1641491
Roman Perik-Zavodskii, Olga Perik-Zavodskaia, Marina Volynets, Saleh Alrhmoun, Sergey Sennikov

Introduction: Single-cell multi-omics has transformed T-cell biology by enabling the simultaneous analysis of T-cell receptor (TCR) sequences, transcriptomes, and surface proteins at the resolution of individual cells. These capabilities are critical for identifying antigen-specific T-cells and accelerating the development of TCR-based immunotherapies.

Methods: Here, we introduce TCRscape, an open-source Python 3 tool designed for high-resolution T-cell receptor clonotype discovery and quantification, optimized for BD Rhapsody™ single-cell multi-omics data.

Results: TCRscape integrates full-length TCR sequence data with gene expression profiles and surface protein expression to enable multimodal clustering of αβ and γδ T-cell populations. It also outputs Seurat-compatible matrices, facilitating downstream visualization and analysis in standard single-cell analysis environments.

Discussion: By bridging clonotype detection with immune cell transcriptome, proteome, and antigen specificity profiling, TCRscape supports rapid identification of dominant T-cell clones and their functional phenotypes, offering a powerful resource for immune monitoring and TCR-engineered therapeutic development. TCRscape can be found at https://github.com/Perik-Zavodskii/TCRscape/.

单细胞多组学通过在单个细胞的分辨率上同时分析t细胞受体(TCR)序列、转录组和表面蛋白,改变了t细胞生物学。这些能力对于识别抗原特异性t细胞和加速基于tcr的免疫疗法的发展至关重要。方法:本文介绍了TCRscape,这是一个开源的Python 3工具,用于高分辨率t细胞受体克隆型发现和定量,并针对BD Rhapsody™单细胞多组学数据进行了优化。结果:TCRscape整合了全长TCR序列数据、基因表达谱和表面蛋白表达,实现了αβ和γδ t细胞群体的多模态聚类。它还输出与seurat兼容的矩阵,便于在标准单细胞分析环境中进行下游可视化和分析。讨论:通过将克隆型检测与免疫细胞转录组、蛋白质组和抗原特异性分析连接起来,TCRscape支持快速鉴定优势t细胞克隆及其功能表型,为免疫监测和tcr工程治疗开发提供了强大的资源。TCRscape可以在https://github.com/Perik-Zavodskii/TCRscape/上找到。
{"title":"TCRscape: a single-cell multi-omic TCR profiling toolkit.","authors":"Roman Perik-Zavodskii, Olga Perik-Zavodskaia, Marina Volynets, Saleh Alrhmoun, Sergey Sennikov","doi":"10.3389/fbinf.2025.1641491","DOIUrl":"10.3389/fbinf.2025.1641491","url":null,"abstract":"<p><strong>Introduction: </strong>Single-cell multi-omics has transformed T-cell biology by enabling the simultaneous analysis of T-cell receptor (TCR) sequences, transcriptomes, and surface proteins at the resolution of individual cells. These capabilities are critical for identifying antigen-specific T-cells and accelerating the development of TCR-based immunotherapies.</p><p><strong>Methods: </strong>Here, we introduce TCRscape, an open-source Python 3 tool designed for high-resolution T-cell receptor clonotype discovery and quantification, optimized for BD Rhapsody™ single-cell multi-omics data.</p><p><strong>Results: </strong>TCRscape integrates full-length TCR sequence data with gene expression profiles and surface protein expression to enable multimodal clustering of αβ and γδ T-cell populations. It also outputs Seurat-compatible matrices, facilitating downstream visualization and analysis in standard single-cell analysis environments.</p><p><strong>Discussion: </strong>By bridging clonotype detection with immune cell transcriptome, proteome, and antigen specificity profiling, TCRscape supports rapid identification of dominant T-cell clones and their functional phenotypes, offering a powerful resource for immune monitoring and TCR-engineered therapeutic development. TCRscape can be found at https://github.com/Perik-Zavodskii/TCRscape/.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1641491"},"PeriodicalIF":3.9,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12446293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein cleaver: an interactive web interface for in silico prediction and systematic annotation of protein digestion-derived peptides. 蛋白质切割器:一个交互式网络界面,用于蛋白质消化衍生肽的计算机预测和系统注释。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-04 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1576317
Grigorios Koulouras, Yingrong Xu

Proteolytic digestion is an essential process in mass spectrometry-based proteomics for converting proteins into peptides, hence crucial for protein identification and quantification. In a typical proteomics experiment, digestion reagents are selected without prior evaluation of their optimality for detecting proteins or peptides of interest, partly due to the lack of comprehensive and user-friendly predictive tools. In this work, we introduce Protein Cleaver, a web-based application that systematically assesses regions of proteins that are likely or unlikely to be identified, along with extensive sequence and structure annotation and visualization features. We showcase practical examples of Protein Cleaver's usability in drug discovery and highlight proteins that are typically difficult to detect using the most common proteolytic enzymes. We evaluate trypsin and chymotrypsin for identifying G-protein-coupled receptors and discover that chymotrypsin produces significantly more identifiable peptides than trypsin. We perform a bulk digestion analysis and assess 36 proteolytic enzymes for their ability to detect most of cysteine-containing peptides in the human proteome. We anticipate Protein Cleaver to be a valuable auxiliary tool for proteomics scientists.

蛋白质水解消化是基于质谱的蛋白质组学中将蛋白质转化为多肽的重要过程,因此对蛋白质鉴定和定量至关重要。在典型的蛋白质组学实验中,消化试剂的选择没有事先评估其检测感兴趣的蛋白质或肽的最佳性,部分原因是缺乏全面和用户友好的预测工具。在这项工作中,我们介绍了Protein Cleaver,这是一个基于web的应用程序,可以系统地评估可能或不可能被识别的蛋白质区域,以及广泛的序列和结构注释和可视化功能。我们展示了Protein Cleaver在药物发现中的可用性的实际例子,并强调了使用最常见的蛋白水解酶通常难以检测到的蛋白质。我们评估了胰蛋白酶和凝乳胰蛋白酶在识别g蛋白偶联受体方面的作用,发现凝乳胰蛋白酶比胰蛋白酶产生更多可识别的肽。我们进行了大量消化分析,并评估了36种蛋白水解酶检测人类蛋白质组中大多数含半胱氨酸肽的能力。我们期待Protein Cleaver成为蛋白质组学科学家的一个有价值的辅助工具。
{"title":"Protein cleaver: an interactive web interface for <i>in silico</i> prediction and systematic annotation of protein digestion-derived peptides.","authors":"Grigorios Koulouras, Yingrong Xu","doi":"10.3389/fbinf.2025.1576317","DOIUrl":"10.3389/fbinf.2025.1576317","url":null,"abstract":"<p><p>Proteolytic digestion is an essential process in mass spectrometry-based proteomics for converting proteins into peptides, hence crucial for protein identification and quantification. In a typical proteomics experiment, digestion reagents are selected without prior evaluation of their optimality for detecting proteins or peptides of interest, partly due to the lack of comprehensive and user-friendly predictive tools. In this work, we introduce Protein Cleaver, a web-based application that systematically assesses regions of proteins that are likely or unlikely to be identified, along with extensive sequence and structure annotation and visualization features. We showcase practical examples of Protein Cleaver's usability in drug discovery and highlight proteins that are typically difficult to detect using the most common proteolytic enzymes. We evaluate trypsin and chymotrypsin for identifying G-protein-coupled receptors and discover that chymotrypsin produces significantly more identifiable peptides than trypsin. We perform a bulk digestion analysis and assess 36 proteolytic enzymes for their ability to detect most of cysteine-containing peptides in the human proteome. We anticipate Protein Cleaver to be a valuable auxiliary tool for proteomics scientists.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1576317"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445168/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1