首页 > 最新文献

Frontiers in bioinformatics最新文献

英文 中文
EPheClass: ensemble-based phenotype classifier from 16S rRNA gene sequences. epeclass:基于集成的16S rRNA基因序列表型分类器。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-30 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1514880
Lara Vázquez-González, Carlos Peña-Reyes, Alba Regueira-Iglesias, Carlos Balsa-Castro, Inmaculada Tomás, María J Carreira

One area of bioinformatics that is currently attracting particular interest is the classification of polymicrobial diseases using machine learning (ML), with data obtained from high-throughput amplicon sequencing of the 16S rRNA gene in human microbiome samples. The microbial dysbiosis underlying these types of diseases is particularly challenging to classify, as the data is highly dimensional, with potentially hundreds or even thousands of predictive features. In addition, the imbalance in the composition of the microbial community is highly heterogeneous across samples. In this paper, we propose a curated pipeline for binary phenotype classification based on a count table of 16S rRNA gene amplicons, which can be applied to any microbiome. To evaluate our proposal, raw 16S rRNA gene sequences from samples of healthy and periodontally affected oral microbiomes that met certain quality criteria were downloaded from public repositories. In the end, a total of 2,581 samples were analysed. In our approach, we first reduced the dimensionality of the data using feature selection methods. After tuning and evaluating different machine learning (ML) models and ensembles created using Dynamic Ensemble Selection (DES) techniques, we found that all DES models performed similarly and were more robust than individual models. Although the margin over other methods was minimal, DES-P achieved the highest AUC and was therefore selected as the representative technique in our analysis. When diagnosing periodontal disease with saliva samples, it achieved with only 13 features an F1 score of 0.913, a precision of 0.881, a recall (sensitivity) of 0.947, an accuracy of 0.929, and an AUC of 0.973. In addition, we used EPheClass to diagnose inflammatory bowel disease (IBD) and obtained better results than other works in the literature using the same dataset. We also evaluated its effectiveness in detecting antibiotic exposure, where it again demonstrated competitive results. This highlights the importance and generalisation aspect of our classification approach, which is applicable to different phenotypes, study niches, and sample types. The code is available at https://gitlab.citius.usc.es/lara.vazquez/epheclass.

生物信息学的一个领域目前特别吸引人的兴趣是使用机器学习(ML)对多微生物疾病进行分类,其数据来自人类微生物组样本中16S rRNA基因的高通量扩增子测序。这些类型疾病背后的微生物生态失调尤其具有挑战性,因为数据是高度多维的,可能有数百甚至数千个预测特征。此外,微生物群落组成的不平衡在不同样品中是高度异质性的。在本文中,我们提出了一个基于16S rRNA基因扩增子计数表的二元表型分类管道,该管道可应用于任何微生物组。为了评估我们的建议,从公共存储库下载了健康和牙周影响的口腔微生物组样本中符合一定质量标准的原始16S rRNA基因序列。最后,总共分析了2581个样本。在我们的方法中,我们首先使用特征选择方法降低数据的维数。在调整和评估使用动态集成选择(DES)技术创建的不同机器学习(ML)模型和集成后,我们发现所有DES模型的表现相似,并且比单个模型更健壮。虽然与其他方法的差异很小,但DES-P获得了最高的AUC,因此在我们的分析中被选为代表性技术。当唾液样本诊断牙周病时,仅13个特征的F1得分为0.913,精密度为0.881,召回率(灵敏度)为0.947,准确度为0.929,AUC为0.973。此外,我们使用EPheClass来诊断炎症性肠病(IBD),并获得了比使用相同数据集的其他文献更好的结果。我们还评估了它在检测抗生素暴露方面的有效性,再次展示了具有竞争力的结果。这突出了我们的分类方法的重要性和概括性方面,这适用于不同的表型,研究利基和样本类型。代码可在https://gitlab.citius.usc.es/lara.vazquez/epheclass上获得。
{"title":"EPheClass: ensemble-based phenotype classifier from 16S rRNA gene sequences.","authors":"Lara Vázquez-González, Carlos Peña-Reyes, Alba Regueira-Iglesias, Carlos Balsa-Castro, Inmaculada Tomás, María J Carreira","doi":"10.3389/fbinf.2025.1514880","DOIUrl":"10.3389/fbinf.2025.1514880","url":null,"abstract":"<p><p>One area of bioinformatics that is currently attracting particular interest is the classification of polymicrobial diseases using machine learning (ML), with data obtained from high-throughput amplicon sequencing of the 16S rRNA gene in human microbiome samples. The microbial dysbiosis underlying these types of diseases is particularly challenging to classify, as the data is highly dimensional, with potentially hundreds or even thousands of predictive features. In addition, the imbalance in the composition of the microbial community is highly heterogeneous across samples. In this paper, we propose a curated pipeline for binary phenotype classification based on a count table of 16S rRNA gene amplicons, which can be applied to any microbiome. To evaluate our proposal, raw 16S rRNA gene sequences from samples of healthy and periodontally affected oral microbiomes that met certain quality criteria were downloaded from public repositories. In the end, a total of 2,581 samples were analysed. In our approach, we first reduced the dimensionality of the data using feature selection methods. After tuning and evaluating different machine learning (ML) models and ensembles created using Dynamic Ensemble Selection (DES) techniques, we found that all DES models performed similarly and were more robust than individual models. Although the margin over other methods was minimal, DES-P achieved the highest AUC and was therefore selected as the representative technique in our analysis. When diagnosing periodontal disease with saliva samples, it achieved with only 13 features an F1 score of 0.913, a precision of 0.881, a recall (sensitivity) of 0.947, an accuracy of 0.929, and an AUC of 0.973. In addition, we used EPheClass to diagnose inflammatory bowel disease (IBD) and obtained better results than other works in the literature using the same dataset. We also evaluated its effectiveness in detecting antibiotic exposure, where it again demonstrated competitive results. This highlights the importance and generalisation aspect of our classification approach, which is applicable to different phenotypes, study niches, and sample types. The code is available at https://gitlab.citius.usc.es/lara.vazquez/epheclass.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1514880"},"PeriodicalIF":3.9,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12518240/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145304801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discovering molecules and plants with potential activity against gastric cancer: an in silico ensemble-based modeling analysis. 发现具有潜在抗胃癌活性的分子和植物:基于硅集成的建模分析。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-30 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1642039
Micaela Villacrés, Alec Avila, Karina Jimenes-Vargas, António Machado, José M Alvarez-Suarez, Eduardo Tejera

Background: Gastric cancer (GC) remains a major global health burden despite advances in diagnosis and treatment. In recent years, natural products have gained increasing attention as promising sources of anticancer agents, including GC.

Methods: In this study, we applied an in silico ensemble-based modeling strategy to predict compounds with potential inhibitory effects against four GC-related cell lines: AGS, NCI-N87, BGC-823, and SNU-16. Individual predictive models were developed using several algorithms and further integrated into two consensus ensemble multi-objective models. A comprehensive database of over 100,000 natural compounds from 21,665 plant species, was screened for validation and to identify potential molecular candidates.

Results: The ensemble models demonstrated a 12-15-fold improvement in identifying active molecules compared to random selection. A total of 340 molecules were prioritized, many belonging to bioactive classes such as taxane diterpenoids, flavonoids, isoflavonoids, phloroglucinols, and tryptophan alkaloids. Known anticancer compounds, including paclitaxel, orsaponin (OSW-1), glycybenzofuran, and glyurallin A, were successfully retrieved, reinforcing the validity of the approach. Species from the genera Taxus, Glycyrrhiza, Elaphoglossum, and Seseli emerged as particularly relevant sources of bioactive candidates.

Conclusion: While some genera, such as Taxus and Glycyrrhiza, have well-documented anticancer properties, others, including Elaphoglossum and Seseli, require further experimental validation. These findings highlight the potential of combining multi-objectives ensemble modeling with natural product databases to discover novel phytochemicals relevant to GC treatment.

背景:尽管在诊断和治疗方面取得了进展,胃癌(GC)仍然是全球主要的健康负担。近年来,天然产物作为抗癌药物的有前途的来源受到越来越多的关注,包括GC。方法:在本研究中,我们采用基于硅集成的建模策略来预测对四种gc相关细胞系(AGS, NCI-N87, BGC-823和SNU-16)具有潜在抑制作用的化合物。使用多种算法建立了个体预测模型,并进一步整合到两个共识集成多目标模型中。筛选了来自21,665种植物的超过100,000种天然化合物的综合数据库,以进行验证并确定潜在的分子候选物。结果:与随机选择相比,集成模型在识别活性分子方面提高了12-15倍。总共340个分子被优先考虑,其中许多属于生物活性类,如紫杉烷二萜、类黄酮、异类黄酮、间苯三酚和色氨酸生物碱。已知的抗癌化合物,包括紫杉醇,或皂苷(OSW-1), glycybenzofuran和glyurallin A,成功地检索,加强了该方法的有效性。红豆杉属、Glycyrrhiza属、Elaphoglossum属和Seseli属的物种是特别相关的生物活性候选来源。结论:虽然一些属,如红豆杉和甘草,具有良好的抗癌特性,但其他属,包括Elaphoglossum和Seseli,需要进一步的实验验证。这些发现突出了将多目标集成模型与天然产物数据库相结合,以发现与GC处理相关的新型植物化学物质的潜力。
{"title":"Discovering molecules and plants with potential activity against gastric cancer: an <i>in silico</i> ensemble-based modeling analysis.","authors":"Micaela Villacrés, Alec Avila, Karina Jimenes-Vargas, António Machado, José M Alvarez-Suarez, Eduardo Tejera","doi":"10.3389/fbinf.2025.1642039","DOIUrl":"10.3389/fbinf.2025.1642039","url":null,"abstract":"<p><strong>Background: </strong>Gastric cancer (GC) remains a major global health burden despite advances in diagnosis and treatment. In recent years, natural products have gained increasing attention as promising sources of anticancer agents, including GC.</p><p><strong>Methods: </strong>In this study, we applied an <i>in silico</i> ensemble-based modeling strategy to predict compounds with potential inhibitory effects against four GC-related cell lines: AGS, NCI-N87, BGC-823, and SNU-16. Individual predictive models were developed using several algorithms and further integrated into two consensus ensemble multi-objective models. A comprehensive database of over 100,000 natural compounds from 21,665 plant species, was screened for validation and to identify potential molecular candidates.</p><p><strong>Results: </strong>The ensemble models demonstrated a 12-15-fold improvement in identifying active molecules compared to random selection. A total of 340 molecules were prioritized, many belonging to bioactive classes such as taxane diterpenoids, flavonoids, isoflavonoids, phloroglucinols, and tryptophan alkaloids. Known anticancer compounds, including paclitaxel, orsaponin (OSW-1), glycybenzofuran, and glyurallin A, were successfully retrieved, reinforcing the validity of the approach. Species from the genera <i>Taxus</i>, <i>Glycyrrhiza</i>, <i>Elaphoglossum</i>, and <i>Seseli</i> emerged as particularly relevant sources of bioactive candidates.</p><p><strong>Conclusion: </strong>While some genera, such as <i>Taxus</i> and <i>Glycyrrhiza</i>, have well-documented anticancer properties, others, including <i>Elaphoglossum</i> and <i>Seseli</i>, require further experimental validation. These findings highlight the potential of combining multi-objectives ensemble modeling with natural product databases to discover novel phytochemicals relevant to GC treatment.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1642039"},"PeriodicalIF":3.9,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12518311/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145304800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrative machine learning and transcriptomic analysis identifies key molecular targets in MNPN-associated oral squamous cell carcinoma pathogenesis. 综合机器学习和转录组学分析确定了与mnpn相关的口腔鳞状细胞癌发病机制的关键分子靶点。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-25 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1664576
Xiangjun Wang, Panpan Jin, Juan Xu, Junyi Li, Mengzhen Ji

Background: Oral squamous cell carcinoma (OSCC) represents a significant global health challenge, with betel nut consumption being a major risk factor. 3-(methylnitrosamino)propionitrile (MNPN), a betel nut-derived nitrosamine, has been identified as a potential carcinogen, but its molecular targets in OSCC pathogenesis remain poorly understood.

Methods: We employed a comprehensive computational framework integrating target prediction, transcriptomic analysis, weighted gene co-expression network analysis (WGCNA), and machine learning approaches. Four OSCC datasets from Gene Expression Omnibus (GEO) were analyzed, and MNPN targets were predicted using ChEMBL, PharmMapper, and SwissTargetPrediction databases. Machine learning algorithms (n = 127 combinations) were evaluated for optimal biomarker identification, with model interpretability assessed using SHAP (SHapley Additive exPlanations) analysis.

Results: Target prediction identified 881 potential MNPN targets across three databases. WGCNA revealed 534 OSCC-associated differentially expressed genes, with 38 overlapping MNPN targets. Machine learning optimization identified 13 hub genes, with PLAU demonstrating the highest predictive performance (AUC = 0.944). SHAP analysis confirmed PLAU and PLOD3 as the most influential contributors to disease prediction. Functional enrichment analysis revealed MNPN targets' involvement in xenobiotic response, hypoxic conditions, and aberrant tissue remodeling.

Conclusion: This study provides the first comprehensive molecular characterization of MNPN-associated OSCC pathogenesis, identifying PLAU as a critical therapeutic target with exceptional diagnostic potential. Our findings establish a foundation for developing targeted interventions for betel nut nitrosamine-associated oral cancers and demonstrate the power of integrative computational approaches in environmental carcinogen research.

背景:口腔鳞状细胞癌(OSCC)是一个重大的全球健康挑战,槟榔是一个主要的危险因素。3-(甲基亚硝胺)丙腈(MNPN)是一种源自槟榔的亚硝胺,已被确定为一种潜在的致癌物,但其在OSCC发病机制中的分子靶点尚不清楚。方法:我们采用了一个综合的计算框架,整合了目标预测、转录组学分析、加权基因共表达网络分析(WGCNA)和机器学习方法。分析来自Gene Expression Omnibus (GEO)的4个OSCC数据集,并使用ChEMBL、PharmMapper和SwissTargetPrediction数据库预测MNPN靶点。评估机器学习算法(n = 127个组合)以确定最佳生物标志物,并使用SHapley加性解释(SHapley Additive explanation)分析评估模型的可解释性。结果:目标预测在三个数据库中确定了881个潜在的MNPN目标。WGCNA共发现534个oscc相关差异表达基因,其中38个MNPN靶点重叠。机器学习优化识别出13个轮毂基因,其中PLAU的预测性能最高(AUC = 0.944)。SHAP分析证实PLAU和PLOD3是预测疾病最具影响力的因子。功能富集分析显示MNPN靶点参与异种生物反应、缺氧条件和异常组织重塑。结论:本研究首次提供了mnpn相关OSCC发病机制的全面分子特征,确定了PLAU是具有特殊诊断潜力的关键治疗靶点。我们的研究结果为开发针对槟榔亚硝胺相关口腔癌的靶向干预奠定了基础,并展示了综合计算方法在环境致癌物研究中的力量。
{"title":"Integrative machine learning and transcriptomic analysis identifies key molecular targets in MNPN-associated oral squamous cell carcinoma pathogenesis.","authors":"Xiangjun Wang, Panpan Jin, Juan Xu, Junyi Li, Mengzhen Ji","doi":"10.3389/fbinf.2025.1664576","DOIUrl":"10.3389/fbinf.2025.1664576","url":null,"abstract":"<p><strong>Background: </strong>Oral squamous cell carcinoma (OSCC) represents a significant global health challenge, with betel nut consumption being a major risk factor. 3-(methylnitrosamino)propionitrile (MNPN), a betel nut-derived nitrosamine, has been identified as a potential carcinogen, but its molecular targets in OSCC pathogenesis remain poorly understood.</p><p><strong>Methods: </strong>We employed a comprehensive computational framework integrating target prediction, transcriptomic analysis, weighted gene co-expression network analysis (WGCNA), and machine learning approaches. Four OSCC datasets from Gene Expression Omnibus (GEO) were analyzed, and MNPN targets were predicted using ChEMBL, PharmMapper, and SwissTargetPrediction databases. Machine learning algorithms (n = 127 combinations) were evaluated for optimal biomarker identification, with model interpretability assessed using SHAP (SHapley Additive exPlanations) analysis.</p><p><strong>Results: </strong>Target prediction identified 881 potential MNPN targets across three databases. WGCNA revealed 534 OSCC-associated differentially expressed genes, with 38 overlapping MNPN targets. Machine learning optimization identified 13 hub genes, with PLAU demonstrating the highest predictive performance (AUC = 0.944). SHAP analysis confirmed PLAU and PLOD3 as the most influential contributors to disease prediction. Functional enrichment analysis revealed MNPN targets' involvement in xenobiotic response, hypoxic conditions, and aberrant tissue remodeling.</p><p><strong>Conclusion: </strong>This study provides the first comprehensive molecular characterization of MNPN-associated OSCC pathogenesis, identifying PLAU as a critical therapeutic target with exceptional diagnostic potential. Our findings establish a foundation for developing targeted interventions for betel nut nitrosamine-associated oral cancers and demonstrate the power of integrative computational approaches in environmental carcinogen research.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1664576"},"PeriodicalIF":3.9,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12508658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145282010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computational drug repurposing reveals Alectinib as a potential lead targeting Cathepsin S for therapeutic developments against cancer and chronic pain. 计算药物再利用揭示了Alectinib作为潜在的先导靶向组织蛋白酶S治疗癌症和慢性疼痛的发展。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-24 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1666573
Mohammed Alrouji, Mohammed S Alshammari, Sharif Alhajlah, Syed Tasqeeruddin, Khuzin Dinislam, Anas Shamsi, Saleha Anwar

Cathepsin S (CathS) is a cysteine protease known to play a role in extracellular matrix (ECM) re-modelling, antigen presentation, immune cells polarisation, and cancer progression and chronic pain pathophysiology. CathS also causes an immunosuppressive environment in solid tumors and is involved in nociceptive signaling. Although several small-molecule inhibitors with favorable in vivo properties have been developed, their clinical utility is limited due to resistance, off-target effects, and suboptimal efficacy. Therefore, alternative therapeutic strategies are urgently needed. In the present study, we utilized an integrated virtual screening protocol to screen 3,500 commercially available FDA-approved drug molecules from DrugBank against the CathS crystal structure, based on which drug-likeness profile and interaction studies were performed to filter putative candidates. Alectinib was found to be a top hit and had significant interactions with the important active-site residues His278 and Cys139. PASS predictions suggested relevant anticancer and anti-pain activities for Alectinib in reference to the control inhibitor Q1N. Later, 500-ns molecular dynamics simulations under the CHARMM36 condition revealed that the CathS-Alectinib complex maintained its structural stability, as indicated by conformational parameters, hydrogen-bond persistence, and essential dynamics analyses. Further MM-PBSA calculations also confirmed a favorable binding free energy (ΔG -20.16 ± 2.59 kcal/mol) dominated by the van der Waals and electrostatic contributions. These computational findings suggest that Alectinib may have potential as a repurposed CathS inhibitor, warranting further experimental testing in relevant cancer and chronic pain models. Notably, these results are based solely on computational analysis and require empirical validation.

组织蛋白酶S (CathS)是一种半胱氨酸蛋白酶,已知在细胞外基质(ECM)重塑、抗原呈递、免疫细胞极化、癌症进展和慢性疼痛病理生理中发挥作用。在实体肿瘤中,CathS也引起免疫抑制环境,并参与伤害性信号传导。尽管已经开发出几种具有良好体内特性的小分子抑制剂,但由于耐药、脱靶效应和疗效欠佳,它们的临床应用受到限制。因此,迫切需要替代治疗策略。在本研究中,我们利用一个集成的虚拟筛选方案筛选了来自DrugBank的3500个经fda批准的商业化药物分子,并根据cths晶体结构进行了药物相似性分析和相互作用研究,以筛选候选药物。Alectinib被发现是一个顶hit,并且与重要的活性位点残基His278和Cys139有显著的相互作用。PASS预测表明,与对照抑制剂Q1N相比,Alectinib具有相关的抗癌和抗疼痛活性。随后,在CHARMM36条件下进行的500-ns分子动力学模拟表明,CathS-Alectinib配合物的构象参数、氢键持久性和基本动力学分析表明,其结构保持稳定。进一步的MM-PBSA计算也证实了良好的结合自由能(ΔG -20.16±2.59 kcal/mol)主要由范德华和静电贡献。这些计算结果表明,Alectinib可能有潜力作为一种重新用途的CathS抑制剂,值得在相关癌症和慢性疼痛模型中进一步进行实验测试。值得注意的是,这些结果仅基于计算分析,需要经验验证。
{"title":"Computational drug repurposing reveals Alectinib as a potential lead targeting Cathepsin S for therapeutic developments against cancer and chronic pain.","authors":"Mohammed Alrouji, Mohammed S Alshammari, Sharif Alhajlah, Syed Tasqeeruddin, Khuzin Dinislam, Anas Shamsi, Saleha Anwar","doi":"10.3389/fbinf.2025.1666573","DOIUrl":"10.3389/fbinf.2025.1666573","url":null,"abstract":"<p><p>Cathepsin S (CathS) is a cysteine protease known to play a role in extracellular matrix (ECM) re-modelling, antigen presentation, immune cells polarisation, and cancer progression and chronic pain pathophysiology. CathS also causes an immunosuppressive environment in solid tumors and is involved in nociceptive signaling. Although several small-molecule inhibitors with favorable <i>in vivo</i> properties have been developed, their clinical utility is limited due to resistance, off-target effects, and suboptimal efficacy. Therefore, alternative therapeutic strategies are urgently needed. In the present study, we utilized an integrated virtual screening protocol to screen 3,500 commercially available FDA-approved drug molecules from DrugBank against the CathS crystal structure, based on which drug-likeness profile and interaction studies were performed to filter putative candidates. Alectinib was found to be a top hit and had significant interactions with the important active-site residues His278 and Cys139. PASS predictions suggested relevant anticancer and anti-pain activities for Alectinib in reference to the control inhibitor Q1N. Later, 500-ns molecular dynamics simulations under the CHARMM36 condition revealed that the CathS-Alectinib complex maintained its structural stability, as indicated by conformational parameters, hydrogen-bond persistence, and essential dynamics analyses. Further MM-PBSA calculations also confirmed a favorable binding free energy (Δ<i>G</i> -20.16 ± 2.59 kcal/mol) dominated by the van der Waals and electrostatic contributions. These computational findings suggest that Alectinib may have potential as a repurposed CathS inhibitor, warranting further experimental testing in relevant cancer and chronic pain models. Notably, these results are based solely on computational analysis and require empirical validation.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1666573"},"PeriodicalIF":3.9,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12504298/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145260089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting a COVID-19 signature from a multi-omic dataset. 从多基因组数据集中提取COVID-19特征。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-22 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1645785
Baptiste Bauvin, Thibaud Godon, Guillaume Bachelot, Claudia Carpentier, Riikka Huusaari, Maxime Deraspe, Juho Rousu, Caroline Quach, Jacques Corbeil

Introduction: The complexity of COVID-19 requires approaches that extend beyond symptom-based descriptors. Multi-omic data, combining clinical, proteomic, and metabolomic information, offer a more detailed view of disease mechanisms and biomarker discovery.

Methods: As part of a large-scale Quebec initiative, we collected extensive datasets from COVID-19 positive and negative patient samples. Using a multi-view machine learning framework with ensemble methods, we integrated thousands of features across clinical, proteomic, and metabolomic domains to classify COVID-19 status. We further applied a novel feature relevance methodology to identify condensed signatures.

Results: Our models achieved a balanced accuracy of 89% ± 5% despite the high-dimensional nature of the data. Feature selection yielded 12- and 50-feature signatures that improved classification accuracy by at least 3% compared to the full feature set. These signatures were both accurate and interpretable.

Discussion: This work demonstrates that multi-omic integration, combined with advanced machine learning, enables the extraction of robust COVID-19 signatures from complex datasets. The condensed biomarker sets provide a practical path toward improved diagnosis and precision medicine, representing a significant advancement in COVID-19 biomarker discovery.

导言:COVID-19的复杂性要求采取超越基于症状描述符的方法。多组学数据,结合临床、蛋白质组学和代谢组学信息,为疾病机制和生物标志物的发现提供了更详细的视角。方法:作为魁北克大规模倡议的一部分,我们从COVID-19阳性和阴性患者样本中收集了大量数据集。使用集成方法的多视图机器学习框架,我们整合了临床、蛋白质组学和代谢组学领域的数千个特征,对COVID-19状态进行分类。我们进一步应用了一种新的特征关联方法来识别压缩签名。结果:尽管数据具有高维性质,但我们的模型实现了89%±5%的平衡精度。特征选择产生了12个和50个特征签名,与完整的特征集相比,分类准确率至少提高了3%。这些签名既准确又可解释。讨论:这项工作表明,多组学集成与先进的机器学习相结合,可以从复杂的数据集中提取稳健的COVID-19特征。浓缩的生物标志物集为改进诊断和精准医疗提供了实用途径,代表了COVID-19生物标志物发现的重大进展。
{"title":"Extracting a COVID-19 signature from a multi-omic dataset.","authors":"Baptiste Bauvin, Thibaud Godon, Guillaume Bachelot, Claudia Carpentier, Riikka Huusaari, Maxime Deraspe, Juho Rousu, Caroline Quach, Jacques Corbeil","doi":"10.3389/fbinf.2025.1645785","DOIUrl":"10.3389/fbinf.2025.1645785","url":null,"abstract":"<p><strong>Introduction: </strong>The complexity of COVID-19 requires approaches that extend beyond symptom-based descriptors. Multi-omic data, combining clinical, proteomic, and metabolomic information, offer a more detailed view of disease mechanisms and biomarker discovery.</p><p><strong>Methods: </strong>As part of a large-scale Quebec initiative, we collected extensive datasets from COVID-19 positive and negative patient samples. Using a multi-view machine learning framework with ensemble methods, we integrated thousands of features across clinical, proteomic, and metabolomic domains to classify COVID-19 status. We further applied a novel feature relevance methodology to identify condensed signatures.</p><p><strong>Results: </strong>Our models achieved a balanced accuracy of 89% ± 5% despite the high-dimensional nature of the data. Feature selection yielded 12- and 50-feature signatures that improved classification accuracy by at least 3% compared to the full feature set. These signatures were both accurate and interpretable.</p><p><strong>Discussion: </strong>This work demonstrates that multi-omic integration, combined with advanced machine learning, enables the extraction of robust COVID-19 signatures from complex datasets. The condensed biomarker sets provide a practical path toward improved diagnosis and precision medicine, representing a significant advancement in COVID-19 biomarker discovery.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1645785"},"PeriodicalIF":3.9,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12497780/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Association between dysregulated expression of Ca2+ and ROS-related genes and breast cancer patient survival. Ca2+和ros相关基因表达异常与乳腺癌患者生存的关系。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-22 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1633494
Sofia Ramos, João Gregório, Ana Sofia Fernandes, Nuno Saraiva

The intricate interplay between Ca2+ and reactive oxygen species (ROS) signalling systems influences numerous cellular pathways. Dysregulated expression of genes associated with Ca2+ and ROS homeostasis can significantly impact cancer progression. Despite extensive research, various underlying mechanisms remain elusive, lacking a comprehensive unified perspective. Breast cancer (BC) remains the leading cause of cancer-related deaths among women, highlighting the pressing need to discover novel regulatory mechanisms, therapeutic targets, and potential biomarkers. In this study, we employed a bioinformatic approach based on data from The Cancer Genome Atlas to assess the association between combined dysregulation of specific pairs of genes involved in redox- or Ca2+-related cellular homeostases and patient outcome. These genes were selected by differences in their expression between normal and tumour tissues and in their individual association with patient survival rates. Cumulative proportion survival at the 5-year post-diagnosis was calculated for each quartile of expression within the population exhibiting either high or low expression of a second gene. Additional genes with expression positively or negatively correlated with the set of relevant gene pairs were identified, and a gene enrichment analysis was performed. Our results show that the simultaneous dysregulation of a selected number of gene pairs is substantially associated with BC patient survival. Notably, the expression dysregulation of these gene pairs is associated with altered expression of genes linked to cell cycle regulation, cell adhesion, and cell projection processes. This approach exhibits a significant potential to identify new prognostic biomarkers or drug targets for BC.

Ca2+和活性氧(ROS)信号系统之间复杂的相互作用影响许多细胞途径。与Ca2+和ROS稳态相关的基因表达失调可以显著影响癌症的进展。尽管进行了广泛的研究,但各种潜在机制仍然难以捉摸,缺乏全面统一的观点。乳腺癌(BC)仍然是女性癌症相关死亡的主要原因,因此迫切需要发现新的调节机制、治疗靶点和潜在的生物标志物。在这项研究中,我们采用了基于癌症基因组图谱数据的生物信息学方法来评估参与氧化还原或Ca2+相关细胞稳态的特定基因对的联合失调与患者预后之间的关系。这些基因是根据正常组织和肿瘤组织之间的表达差异以及它们与患者存活率的个体关联来选择的。计算第二基因高表达或低表达人群中每个四分位数的5年后累积比例生存率。鉴定出与相关基因对组表达正相关或负相关的其他基因,并进行基因富集分析。我们的研究结果表明,一些基因对的同时失调与BC患者的生存有很大的关系。值得注意的是,这些基因对的表达失调与细胞周期调节、细胞粘附和细胞投射过程相关基因的表达改变有关。这种方法在确定新的预后生物标志物或BC的药物靶点方面显示出巨大的潜力。
{"title":"Association between dysregulated expression of Ca<sup>2+</sup> and ROS-related genes and breast cancer patient survival.","authors":"Sofia Ramos, João Gregório, Ana Sofia Fernandes, Nuno Saraiva","doi":"10.3389/fbinf.2025.1633494","DOIUrl":"10.3389/fbinf.2025.1633494","url":null,"abstract":"<p><p>The intricate interplay between Ca<sup>2+</sup> and reactive oxygen species (ROS) signalling systems influences numerous cellular pathways. Dysregulated expression of genes associated with Ca<sup>2+</sup> and ROS homeostasis can significantly impact cancer progression. Despite extensive research, various underlying mechanisms remain elusive, lacking a comprehensive unified perspective. Breast cancer (BC) remains the leading cause of cancer-related deaths among women, highlighting the pressing need to discover novel regulatory mechanisms, therapeutic targets, and potential biomarkers. In this study, we employed a bioinformatic approach based on data from The Cancer Genome Atlas to assess the association between combined dysregulation of specific pairs of genes involved in redox- or Ca<sup>2+</sup>-related cellular homeostases and patient outcome. These genes were selected by differences in their expression between normal and tumour tissues and in their individual association with patient survival rates. Cumulative proportion survival at the 5-year post-diagnosis was calculated for each quartile of expression within the population exhibiting either high or low expression of a second gene. Additional genes with expression positively or negatively correlated with the set of relevant gene pairs were identified, and a gene enrichment analysis was performed. Our results show that the simultaneous dysregulation of a selected number of gene pairs is substantially associated with BC patient survival. Notably, the expression dysregulation of these gene pairs is associated with altered expression of genes linked to cell cycle regulation, cell adhesion, and cell projection processes. This approach exhibits a significant potential to identify new prognostic biomarkers or drug targets for BC.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1633494"},"PeriodicalIF":3.9,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12498015/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145245896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated multi-optosis model for pan-cancer candidate biomarker and therapy target discovery. 泛癌症候选生物标志物和治疗靶点发现的综合多眼观察模型。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-19 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1630518
Emanuell Rodrigues de Souza, Higor Almeida Cordeiro Nogueira, Ronaldo da Silva Francisco Junior, Ana Beatriz Garcia, Enrique Medina-Acosta

Regulated cell death (RCD) is fundamental to tissue homeostasis and cancer progression, influencing therapeutic responses across tumor types. Although individual RCD forms have been extensively studied, a comprehensive framework integrating multiple RCD processes has been lacking, limiting systematic biomarker discovery. To address this gap, we developed a multi-optosis model that incorporates 25 distinct RCD forms and integrates multi-omic and phenotypic data across 33 cancer types. This model enables the identification of candidate biomarkers with translational relevance through genome-wide significant associations. We analyzed 9,385 tumor samples from The Cancer Genome Atlas (TCGA) and 7,429 non-tumor samples from the Genotype-Tissue Expression (GTEx) database, accessed via UCSCXena. Our analysis involved 5,913 RCD-associated genes, spanning 62,090 transcript isoforms, 882 mature miRNAs, and 239 cancer-associated proteins. Seven omic features-protein expression, mutation, copy number variation, miRNA expression, transcript isoform expression, mRNA expression, and CpG methylation-were correlated with seven clinical phenotypic features: tumor mutation burden, microsatellite instability, tumor stemness metrics, hazard ratio contexture, prognostic survival metrics, tumor microenvironment contexture, and tumor immune infiltration contexture. We performed over 27 million pairwise correlations, resulting in 44,641 multi-omic RCD signatures. These signatures capture both unique and overlapping associations between omic and phenotypic features. Apoptosis-related genes were recurrent across most signatures, reaffirming apoptosis as a central node in cancer-related RCD. Notably, isoform-specific signatures were prevalent, indicating critical roles for alternative splicing and promoter usage in cancer biology. For example, MAPK10 isoforms showed distinct phenotypic correlations, while COL1A1 and UMOD displayed gene-level coordination in regulating tumor stemness. Notably, 879 multi-omic signatures include chimeric antigen targets currently under clinical evaluation, underscoring the translational relevance of our findings for precision oncology and immunotherapy. This integrative resource is publicly available via CancerRCDShiny (https://cancerrcdshiny.shinyapps.io/cancerrcdshiny/), supporting future efforts in biomarker discovery and therapeutic target development across diverse cancer types.

调节细胞死亡(RCD)是组织稳态和癌症进展的基础,影响各种肿瘤类型的治疗反应。尽管个体RCD形式已被广泛研究,但缺乏整合多个RCD过程的综合框架,限制了系统的生物标志物发现。为了解决这一差距,我们开发了一个多重光衰模型,该模型包含25种不同的RCD形式,并整合了33种癌症类型的多组学和表型数据。该模型能够通过全基因组显著关联识别具有翻译相关性的候选生物标志物。我们分析了来自癌症基因组图谱(TCGA)的9,385个肿瘤样本和来自基因型组织表达(GTEx)数据库的7,429个非肿瘤样本,这些样本通过UCSCXena访问。我们的分析涉及5,913个rcd相关基因,跨越62,090个转录异构体,882个成熟mirna和239个癌症相关蛋白。7个组学特征——蛋白质表达、突变、拷贝数变异、miRNA表达、转录异构体表达、mRNA表达和CpG甲基化——与7个临床表型特征相关:肿瘤突变负担、微卫星不稳定性、肿瘤干性指标、风险比背景、预后生存指标、肿瘤微环境背景和肿瘤免疫浸润背景。我们执行了超过2700万个两两关联,得到了44,641个多组RCD签名。这些特征捕获了组学和表型特征之间独特和重叠的关联。凋亡相关基因在大多数特征中反复出现,重申了细胞凋亡是癌症相关RCD的中心节点。值得注意的是,异构体特异性特征普遍存在,表明选择性剪接和启动子使用在癌症生物学中的关键作用。例如,MAPK10亚型表现出明显的表型相关性,而COL1A1和UMOD在调节肿瘤干性方面表现出基因水平的协调。值得注意的是,879个多组学特征包括嵌合抗原靶点,目前正在临床评估中,强调了我们的发现在精确肿瘤学和免疫治疗中的转化相关性。此综合资源可通过CancerRCDShiny (https://cancerrcdshiny.shinyapps)公开获取。Io /cancerrcdshiny/),支持未来在不同癌症类型的生物标志物发现和治疗靶点开发方面的努力。
{"title":"Integrated multi-optosis model for pan-cancer candidate biomarker and therapy target discovery.","authors":"Emanuell Rodrigues de Souza, Higor Almeida Cordeiro Nogueira, Ronaldo da Silva Francisco Junior, Ana Beatriz Garcia, Enrique Medina-Acosta","doi":"10.3389/fbinf.2025.1630518","DOIUrl":"10.3389/fbinf.2025.1630518","url":null,"abstract":"<p><p>Regulated cell death (RCD) is fundamental to tissue homeostasis and cancer progression, influencing therapeutic responses across tumor types. Although individual RCD forms have been extensively studied, a comprehensive framework integrating multiple RCD processes has been lacking, limiting systematic biomarker discovery. To address this gap, we developed a multi-optosis model that incorporates 25 distinct RCD forms and integrates multi-omic and phenotypic data across 33 cancer types. This model enables the identification of candidate biomarkers with translational relevance through genome-wide significant associations. We analyzed 9,385 tumor samples from The Cancer Genome Atlas (TCGA) and 7,429 non-tumor samples from the Genotype-Tissue Expression (GTEx) database, accessed <i>via</i> UCSCXena. Our analysis involved 5,913 RCD-associated genes, spanning 62,090 transcript isoforms, 882 mature miRNAs, and 239 cancer-associated proteins. Seven omic features-protein expression, mutation, copy number variation, miRNA expression, transcript isoform expression, mRNA expression, and CpG methylation-were correlated with seven clinical phenotypic features: tumor mutation burden, microsatellite instability, tumor stemness metrics, hazard ratio contexture, prognostic survival metrics, tumor microenvironment contexture, and tumor immune infiltration contexture. We performed over 27 million pairwise correlations, resulting in 44,641 multi-omic RCD signatures. These signatures capture both unique and overlapping associations between omic and phenotypic features. Apoptosis-related genes were recurrent across most signatures, reaffirming apoptosis as a central node in cancer-related RCD. Notably, isoform-specific signatures were prevalent, indicating critical roles for alternative splicing and promoter usage in cancer biology. For example, <i>MAPK10</i> isoforms showed distinct phenotypic correlations, while <i>COL1A1</i> and <i>UMOD</i> displayed gene-level coordination in regulating tumor stemness. Notably, 879 multi-omic signatures include chimeric antigen targets currently under clinical evaluation, underscoring the translational relevance of our findings for precision oncology and immunotherapy. This integrative resource is publicly available <i>via CancerRCDShiny</i> (https://cancerrcdshiny.shinyapps.io/cancerrcdshiny/), supporting future efforts in biomarker discovery and therapeutic target development across diverse cancer types.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1630518"},"PeriodicalIF":3.9,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491264/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and therapeutic investigation of biomarker genes underpinning hepatocellular carcinoma: an in silico study utilising molecular docking and dynamics simulation. 肝细胞癌生物标志物基因的鉴定和治疗研究:利用分子对接和动力学模拟的计算机研究。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-19 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1567748
Jishnu Ghosh, Abdullah M Alshahrani, Aritra Palodhi, Debarghya Bhattacharyya, Subhadip Das, Sunil Kanti Mondal, Abul Kalam, S Rehan Ahmad, Chittabrata Mal

Background: Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related mortality globally, and ranks fifth in terms of incidence. It primarily affects males and has a high prevalence in Asia. Risk factors include hepatitis B and C, liver cirrhosis, nonalcoholic fatty liver disease (NAFLD), and alcohol consumption. Late-stage diagnosis results in a poor survival rate of approximately 20%, underscoring the need for early detection methods to improve the survival rates. This study aimed to identify prognostic biomarkers for HCC through bioinformatic analysis of microarray datasets, providing insights into potential therapeutic targets.

Methods: We analyzed five microarray datasets, comprising 402 HCC samples and 121 control samples. To identify relevant biological pathways, we conducted differential gene expression, Gene Ontology (GO), and KEGG pathway enrichment analyses. We identified hub genes and quantitatively assessed transcription factors and microRNAs targeting these genes. Additionally, molecular docking and dynamic simulations (100 ns) were used to identify potential drug candidates capable of inhibiting the activity of differentially expressed hub genes.

Results: Our bioinformatic approach identified several promising HCC biomarkers. Among these, CDK1/CKS2 was identified as a key therapeutic target, with a regulatory role in HCC pathogenesis, suggesting its potential for further investigation. Digoxin (DB00390) has been highlighted as a potential repurposed drug candidate because of its favorable drug-likeness and stability, as confirmed by virtual screening, ADMET analysis, molecular docking study and dynamic simulations.

Conclusion: This study enhances our understanding of HCC biology and offers new insights into drug interactions. It presents several promising biomarkers for the early diagnosis, prognosis, and therapy. Further investigation into CDK1/CKS2 as a therapeutic target and the role of the identified biomarkers could contribute to improved diagnostic and therapeutic strategies for HCC.

背景:肝细胞癌(HCC)是全球癌症相关死亡的第三大原因,发病率排名第五。它主要影响男性,在亚洲发病率很高。危险因素包括乙型和丙型肝炎、肝硬化、非酒精性脂肪性肝病(NAFLD)和饮酒。晚期诊断导致大约20%的低生存率,强调需要早期发现方法来提高生存率。本研究旨在通过对微阵列数据集的生物信息学分析,确定HCC的预后生物标志物,为潜在的治疗靶点提供见解。方法:我们分析了5个微阵列数据集,包括402例HCC样本和121例对照样本。为了确定相关的生物学途径,我们进行了差异基因表达、基因本体(gene Ontology, GO)和KEGG途径富集分析。我们确定了枢纽基因,并定量评估了靶向这些基因的转录因子和microrna。此外,利用分子对接和动态模拟(100 ns)来鉴定能够抑制差异表达枢纽基因活性的潜在候选药物。结果:我们的生物信息学方法确定了几个有希望的HCC生物标志物。其中,CDK1/CKS2被确定为一个关键的治疗靶点,在HCC发病机制中具有调节作用,表明其有进一步研究的潜力。通过虚拟筛选、ADMET分析、分子对接研究和动态模拟证实,地高辛(DB00390)具有良好的药物相似性和稳定性,已成为潜在的再用途候选药物。结论:本研究增强了我们对HCC生物学的理解,并为药物相互作用提供了新的见解。它提出了一些有希望的早期诊断、预后和治疗的生物标志物。进一步研究CDK1/CKS2作为治疗靶点以及所鉴定的生物标志物的作用可能有助于改善HCC的诊断和治疗策略。
{"title":"Identification and therapeutic investigation of biomarker genes underpinning hepatocellular carcinoma: an <i>in silico</i> study utilising molecular docking and dynamics simulation.","authors":"Jishnu Ghosh, Abdullah M Alshahrani, Aritra Palodhi, Debarghya Bhattacharyya, Subhadip Das, Sunil Kanti Mondal, Abul Kalam, S Rehan Ahmad, Chittabrata Mal","doi":"10.3389/fbinf.2025.1567748","DOIUrl":"10.3389/fbinf.2025.1567748","url":null,"abstract":"<p><strong>Background: </strong>Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related mortality globally, and ranks fifth in terms of incidence. It primarily affects males and has a high prevalence in Asia. Risk factors include hepatitis B and C, liver cirrhosis, nonalcoholic fatty liver disease (NAFLD), and alcohol consumption. Late-stage diagnosis results in a poor survival rate of approximately 20%, underscoring the need for early detection methods to improve the survival rates. This study aimed to identify prognostic biomarkers for HCC through bioinformatic analysis of microarray datasets, providing insights into potential therapeutic targets.</p><p><strong>Methods: </strong>We analyzed five microarray datasets, comprising 402 HCC samples and 121 control samples. To identify relevant biological pathways, we conducted differential gene expression, Gene Ontology (GO), and KEGG pathway enrichment analyses. We identified hub genes and quantitatively assessed transcription factors and microRNAs targeting these genes. Additionally, molecular docking and dynamic simulations (100 ns) were used to identify potential drug candidates capable of inhibiting the activity of differentially expressed hub genes.</p><p><strong>Results: </strong>Our bioinformatic approach identified several promising HCC biomarkers. Among these, CDK1/CKS2 was identified as a key therapeutic target, with a regulatory role in HCC pathogenesis, suggesting its potential for further investigation. Digoxin (DB00390) has been highlighted as a potential repurposed drug candidate because of its favorable drug-likeness and stability, as confirmed by virtual screening, ADMET analysis, molecular docking study and dynamic simulations.</p><p><strong>Conclusion: </strong>This study enhances our understanding of HCC biology and offers new insights into drug interactions. It presents several promising biomarkers for the early diagnosis, prognosis, and therapy. Further investigation into CDK1/CKS2 as a therapeutic target and the role of the identified biomarkers could contribute to improved diagnostic and therapeutic strategies for HCC.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1567748"},"PeriodicalIF":3.9,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable artificial intelligence based on immunoregulation-related genes predicts prognosis and immunotherapy response in lung adenocarcinoma. 基于免疫调节相关基因的可解释人工智能预测肺腺癌的预后和免疫治疗反应。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-19 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1613761
Minghao Wang, Yu Wang, Yitong Li, Chengyi Zhang, Canjun Li, Nan Bi

Introduction: Lung adenocarcinoma (LUAD) is the most common subtype of non-small cell lung cancer, and its benefit from immune checkpoint inhibitors (ICIs) is controversial, especially for patients without driver gene mutations. The potential of immunoregulation-related genes (IRGs) in predicting the prognosis of LUAD and the efficacy of immunotherapy becomes emerging. There is an urgent need to establish a reliable IRGs-based predictive model of ICI response.

Methods: Extract and merge LUAD RNA sequencing data and clinical data from GEO database. The differences in genomic and tumor microenvironment (TME) cell infiltration landscape between normal lung tissue and tumor tissue were comprehensively analyzed. Unsupervised consistent cluster analysis based on genes related to immune regulation was performed on the samples. ESTIMATE and TIMER algorithms were used to analyze the infiltration of immune cells in different groups, and TIDE score was used to evaluate the effectiveness of immunotherapy. Then, lasso regression was used to establish a prognostic model based on identified key IRGs. XGBoost machine learning algorithm was further developed, with SHapley Additive exPlanations (SHAP) to interpret the model.

Results: The GEO LUAD cohort was divided into two clusters based on IRG expression, with significantly better survival outcomes and immune cell infiltration in the IRG-high group compared to the IRG-low group. TIDE scores indicated that the group with high IRG pattern showed a better response to ICI treatment. Then, we developed an IRG index (IRGI) model based on identified 2 key IRGs, GREM1 and PLAU, and IRGI effectively divided patients into high-risk and low-risk groups, revealing significant differences in prognosis, mutational profiles, and immune cell infiltration in the TME between two groups. Subsequently, the interpretable XBGoost machine learning model established based on IRGs could further improve the predictive performance (AUC = 0.975), and SHAP analysis demonstrated that GREM1 had the greatest impact on the overall prediction.

Discussion: IRGI can be used as a valuable biomarker to predict LUAD patient prognosis and response to ICIs. IRGs play a crucial role in shaping the diversity and complexity of TME cell infiltration, which may provide valuable guidance for ICI treatment decisions for LUAD patients.

肺腺癌(LUAD)是最常见的非小细胞肺癌亚型,其免疫检查点抑制剂(ICIs)的益处是有争议的,特别是对于没有驱动基因突变的患者。免疫调节相关基因(IRGs)在预测LUAD预后和免疫治疗效果方面的潜力正在逐渐显现。迫切需要建立一个可靠的基于irgs的ICI反应预测模型。方法:从GEO数据库中提取合并LUAD RNA测序数据和临床数据。综合分析正常肺组织与肿瘤组织基因组及肿瘤微环境(TME)细胞浸润景观的差异。对样本进行基于免疫调节相关基因的无监督一致性聚类分析。采用ESTIMATE和TIMER算法分析各组免疫细胞浸润情况,采用TIDE评分评价免疫治疗效果。然后,利用lasso回归建立基于识别出的关键IRGs的预后模型。进一步开发XGBoost机器学习算法,采用SHapley Additive explanation (SHAP)对模型进行解释。结果:根据IRG表达将GEO LUAD队列分为两组,IRG高组的生存结果和免疫细胞浸润明显优于IRG低组。TIDE评分显示,IRG模式高的组对ICI治疗的反应更好。然后,我们基于鉴定出的2个关键IRGs GREM1和PLAU建立了IRGI指数(IRGI)模型,IRGI有效地将患者分为高危和低危组,揭示了两组患者在预后、突变谱和TME免疫细胞浸润方面的显著差异。随后,基于IRGs建立的可解释XBGoost机器学习模型可以进一步提高预测性能(AUC = 0.975), SHAP分析表明GREM1对整体预测的影响最大。讨论:IRGI可作为一种有价值的生物标志物来预测LUAD患者的预后和对ICIs的反应。IRGs在形成TME细胞浸润的多样性和复杂性方面起着至关重要的作用,这可能为LUAD患者的ICI治疗决策提供有价值的指导。
{"title":"Interpretable artificial intelligence based on immunoregulation-related genes predicts prognosis and immunotherapy response in lung adenocarcinoma.","authors":"Minghao Wang, Yu Wang, Yitong Li, Chengyi Zhang, Canjun Li, Nan Bi","doi":"10.3389/fbinf.2025.1613761","DOIUrl":"10.3389/fbinf.2025.1613761","url":null,"abstract":"<p><strong>Introduction: </strong>Lung adenocarcinoma (LUAD) is the most common subtype of non-small cell lung cancer, and its benefit from immune checkpoint inhibitors (ICIs) is controversial, especially for patients without driver gene mutations. The potential of immunoregulation-related genes (IRGs) in predicting the prognosis of LUAD and the efficacy of immunotherapy becomes emerging. There is an urgent need to establish a reliable IRGs-based predictive model of ICI response.</p><p><strong>Methods: </strong>Extract and merge LUAD RNA sequencing data and clinical data from GEO database. The differences in genomic and tumor microenvironment (TME) cell infiltration landscape between normal lung tissue and tumor tissue were comprehensively analyzed. Unsupervised consistent cluster analysis based on genes related to immune regulation was performed on the samples. ESTIMATE and TIMER algorithms were used to analyze the infiltration of immune cells in different groups, and TIDE score was used to evaluate the effectiveness of immunotherapy. Then, lasso regression was used to establish a prognostic model based on identified key IRGs. XGBoost machine learning algorithm was further developed, with SHapley Additive exPlanations (SHAP) to interpret the model.</p><p><strong>Results: </strong>The GEO LUAD cohort was divided into two clusters based on IRG expression, with significantly better survival outcomes and immune cell infiltration in the IRG-high group compared to the IRG-low group. TIDE scores indicated that the group with high IRG pattern showed a better response to ICI treatment. Then, we developed an IRG index (IRGI) model based on identified 2 key IRGs, GREM1 and PLAU, and IRGI effectively divided patients into high-risk and low-risk groups, revealing significant differences in prognosis, mutational profiles, and immune cell infiltration in the TME between two groups. Subsequently, the interpretable XBGoost machine learning model established based on IRGs could further improve the predictive performance (AUC = 0.975), and SHAP analysis demonstrated that GREM1 had the greatest impact on the overall prediction.</p><p><strong>Discussion: </strong>IRGI can be used as a valuable biomarker to predict LUAD patient prognosis and response to ICIs. IRGs play a crucial role in shaping the diversity and complexity of TME cell infiltration, which may provide valuable guidance for ICI treatment decisions for LUAD patients.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1613761"},"PeriodicalIF":3.9,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12491262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative measures to assess the quality of cellular indexing of transcriptomes and epitopes by sequencing data. 通过测序数据评估转录组和表位的细胞索引质量的定量措施。
IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-09-18 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1630161
Jie Sun, Robert Morrison, Soyeon Kim, Kairuo Yan, Hyun Jung Park

Background: Cellular indexing of transcriptomes and epitopes by sequencing (CITE-Seq) is a powerful technique to simultaneously measure gene expression and cell surface protein abundances in individual cells. To obtain accurate and reliable biological findings from CITE-Seq data, it is critical to ensure rigorous quality control (QC). However, no public method has yet been developed for CITE-Seq QC.

Results: In this study, we propose the first software package for multi-layered, systemic, and quantitative quality control (CITESeQC). Recognizing the multi-layered nature of CITE-Seq data, CITESeQC performs QC across gene expressions, surface proteins, and their interactions. It systemically evaluates all genes and protein markers assayed in the data and filters out some of them based on individual quality measures. Furthermore, for quantitative QC that enables objective and standardized analyses, CITESeQC quantifies cell type-specific expression of genes and surface proteins using Shannon entropy and correlation-based measures. Finally, to ensure broad applicability, CITESeQC guides users through a simple process that generates a complete markdown report with supporting figures and explanations, requiring minimal user intervention.

Conclusion: By quantifying the quality of CITE-Seq data, CITESeQC enables precise characterization of gene expression within cell types and reliable classification of cell types using surface protein markers, thereby enhancing its value for clinical applications.

背景:通过测序对转录组和表位进行细胞索引(CITE-Seq)是同时测量单个细胞中基因表达和细胞表面蛋白丰度的一种强大技术。为了从CITE-Seq数据中获得准确可靠的生物学结果,确保严格的质量控制(QC)至关重要。然而,目前还没有针对CITE-Seq质量控制的公开方法。结果:在本研究中,我们提出了第一个多层次、系统化、定量的质量控制软件包(CITESeQC)。认识到CITE-Seq数据的多层性质,CITESeQC对基因表达、表面蛋白及其相互作用进行QC。它系统地评估数据中分析的所有基因和蛋白质标记,并根据个人质量指标过滤掉其中的一些。此外,为了实现客观和标准化分析的定量QC, CITESeQC使用香农熵和基于相关性的测量来量化基因和表面蛋白的细胞类型特异性表达。最后,为了确保广泛的适用性,CITESeQC指导用户通过一个简单的过程,生成一个完整的降价报告,其中包含支持数据和解释,需要最少的用户干预。结论:通过量化CITE-Seq数据的质量,CITESeQC能够精确表征细胞类型内的基因表达,并利用表面蛋白标记物对细胞类型进行可靠的分类,从而提高其临床应用价值。
{"title":"Quantitative measures to assess the quality of cellular indexing of transcriptomes and epitopes by sequencing data.","authors":"Jie Sun, Robert Morrison, Soyeon Kim, Kairuo Yan, Hyun Jung Park","doi":"10.3389/fbinf.2025.1630161","DOIUrl":"10.3389/fbinf.2025.1630161","url":null,"abstract":"<p><strong>Background: </strong>Cellular indexing of transcriptomes and epitopes by sequencing (CITE-Seq) is a powerful technique to simultaneously measure gene expression and cell surface protein abundances in individual cells. To obtain accurate and reliable biological findings from CITE-Seq data, it is critical to ensure rigorous quality control (QC). However, no public method has yet been developed for CITE-Seq QC.</p><p><strong>Results: </strong>In this study, we propose the first software package for multi-layered, systemic, and quantitative quality control (CITESeQC). Recognizing the multi-layered nature of CITE-Seq data, CITESeQC performs QC across gene expressions, surface proteins, and their interactions. It systemically evaluates all genes and protein markers assayed in the data and filters out some of them based on individual quality measures. Furthermore, for quantitative QC that enables objective and standardized analyses, CITESeQC quantifies cell type-specific expression of genes and surface proteins using Shannon entropy and correlation-based measures. Finally, to ensure broad applicability, CITESeQC guides users through a simple process that generates a complete markdown report with supporting figures and explanations, requiring minimal user intervention.</p><p><strong>Conclusion: </strong>By quantifying the quality of CITE-Seq data, CITESeQC enables precise characterization of gene expression within cell types and reliable classification of cell types using surface protein markers, thereby enhancing its value for clinical applications.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1630161"},"PeriodicalIF":3.9,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12488637/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145234282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Frontiers in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1