Pub Date : 2026-01-06eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1725145
Alexander Hunt, Holger Schulze, Kay Samuel, Robert B Fisher, Till T Bachmann
Background: Rapid detection of bacterial infections through leukocyte activation analysis could significantly reduce diagnostic timeframes from days to hours. Traditional methods like flow cytometry and biomarker assays face limitations including technical complexity, equipment requirements, and delayed results.
Methods: We developed a dual artificial neural network system combining stain-free light microscopy with microfluidic technology to detect morphological changes in activated leukocytes. YOLOv4 networks were trained using five-fold cross-validation on images of chemically stimulated leukocyte subpopulations (lymphocytes, monocytes, and neutrophils) and validated against flow cytometry. The system was tested on whole blood samples spiked with E. coli at clinically relevant concentrations (10-250 CFU/mL).
Results: The optimized four-class network achieved high performance metrics for lymphocytes (F1 score: 80.1% ± 2.5%) and neutrophils (F1 score: 91.7% ± 1.7%), while a specialized binary classifier for monocytes achieved 97.0% ± 2.8% F1 score. In bacteria-spiked whole blood experiments, the system successfully detected activated leukocytes within 30 min at concentrations approaching clinical blood culture detection limits (11.11 ± 4.79 CFU/mL). Neutrophils showed rapid activation peaking at 1-3 h, while lymphocyte activation increased gradually over 6-12 h, consistent with innate versus adaptive immune response kinetics.
Conclusion: This AI-assisted microscopy platform enables rapid, label-free detection of leukocyte activation in response to bacterial infection with minimal sample handling and no requirement for specialized staining or trained technicians. The technology demonstrates potential for accelerating infection diagnosis and could be extended to other pathologies with morphological manifestations.
{"title":"Stain-free artificial intelligence-assisted light microscopy for the identification of leukocyte morphology change in presence of bacteria.","authors":"Alexander Hunt, Holger Schulze, Kay Samuel, Robert B Fisher, Till T Bachmann","doi":"10.3389/fbinf.2025.1725145","DOIUrl":"10.3389/fbinf.2025.1725145","url":null,"abstract":"<p><strong>Background: </strong>Rapid detection of bacterial infections through leukocyte activation analysis could significantly reduce diagnostic timeframes from days to hours. Traditional methods like flow cytometry and biomarker assays face limitations including technical complexity, equipment requirements, and delayed results.</p><p><strong>Methods: </strong>We developed a dual artificial neural network system combining stain-free light microscopy with microfluidic technology to detect morphological changes in activated leukocytes. YOLOv4 networks were trained using five-fold cross-validation on images of chemically stimulated leukocyte subpopulations (lymphocytes, monocytes, and neutrophils) and validated against flow cytometry. The system was tested on whole blood samples spiked with <i>E. coli</i> at clinically relevant concentrations (10-250 CFU/mL).</p><p><strong>Results: </strong>The optimized four-class network achieved high performance metrics for lymphocytes (F1 score: 80.1% ± 2.5%) and neutrophils (F1 score: 91.7% ± 1.7%), while a specialized binary classifier for monocytes achieved 97.0% ± 2.8% F1 score. In bacteria-spiked whole blood experiments, the system successfully detected activated leukocytes within 30 min at concentrations approaching clinical blood culture detection limits (11.11 ± 4.79 CFU/mL). Neutrophils showed rapid activation peaking at 1-3 h, while lymphocyte activation increased gradually over 6-12 h, consistent with innate versus adaptive immune response kinetics.</p><p><strong>Conclusion: </strong>This AI-assisted microscopy platform enables rapid, label-free detection of leukocyte activation in response to bacterial infection with minimal sample handling and no requirement for specialized staining or trained technicians. The technology demonstrates potential for accelerating infection diagnosis and could be extended to other pathologies with morphological manifestations.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1725145"},"PeriodicalIF":3.9,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12816271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1694924
Omar Abdelwahab, Davoud Torkamaneh
Accurate variant calling refinement is crucial for distinguishing true genetic variants from technical artifacts in high-throughput sequencing data. While heuristic filtering and manual review are common approaches for refining variants, manual review is time-consuming, and heuristic filtering often lacks optimal solutions, especially for low-coverage data. Traditional variant calling methods often struggle with accuracy, especially in regions of low read coverage, leading to false-positive or false-negative calls. Advances in artificial intelligence, particularly deep learning, offer promising solutions for automating this refinement process. Here, we present a Transformers-based framework for genetic variant refinement that leverages self-attention to model dependencies among variant features and directly processes VCF files, enabling seamless integration with standard pipelines such as BCFTools and GATK4. Trained on 2 million variants from the GIAB (v4.2.1) sample HG003, the framework achieved 89.26% accuracy and a ROC AUC of 0.88. Across the tested samples, VariantTransformer improved baseline filtering accuracy by 4%-10%, demonstrating consistent gains over the default caller filters. When integrated into conventional variant calling pipelines, VariantTransformer outperformed traditional heuristic filters and, through refinement of existing caller outputs, approached the accuracy achieved by state-of-the-art AI-based variant callers such as DeepVariant, despite not operating as a standalone caller. By positioning this work as a flexible and generalizable framework rather than a single-use model, we highlight the underexplored potential of Transformers for variant refinement in genomics. This study contributes a blueprint for adapting Transformer architectures to a wide range of genomic quality control and filtering tasks. Code is available at: https://github.com/Omar-Abd-Elwahab/VariantTransformer.
{"title":"A Transformers-based framework for refinement of genetic variants.","authors":"Omar Abdelwahab, Davoud Torkamaneh","doi":"10.3389/fbinf.2025.1694924","DOIUrl":"10.3389/fbinf.2025.1694924","url":null,"abstract":"<p><p>Accurate variant calling refinement is crucial for distinguishing true genetic variants from technical artifacts in high-throughput sequencing data. While heuristic filtering and manual review are common approaches for refining variants, manual review is time-consuming, and heuristic filtering often lacks optimal solutions, especially for low-coverage data. Traditional variant calling methods often struggle with accuracy, especially in regions of low read coverage, leading to false-positive or false-negative calls. Advances in artificial intelligence, particularly deep learning, offer promising solutions for automating this refinement process. Here, we present a Transformers-based framework for genetic variant refinement that leverages self-attention to model dependencies among variant features and directly processes VCF files, enabling seamless integration with standard pipelines such as BCFTools and GATK4. Trained on 2 million variants from the GIAB (v4.2.1) sample HG003, the framework achieved 89.26% accuracy and a ROC AUC of 0.88. Across the tested samples, VariantTransformer improved baseline filtering accuracy by 4%-10%, demonstrating consistent gains over the default caller filters. When integrated into conventional variant calling pipelines, VariantTransformer outperformed traditional heuristic filters and, through refinement of existing caller outputs, approached the accuracy achieved by state-of-the-art AI-based variant callers such as DeepVariant, despite not operating as a standalone caller. By positioning this work as a flexible and generalizable framework rather than a single-use model, we highlight the underexplored potential of Transformers for variant refinement in genomics. This study contributes a blueprint for adapting Transformer architectures to a wide range of genomic quality control and filtering tasks. Code is available at: https://github.com/Omar-Abd-Elwahab/VariantTransformer.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1694924"},"PeriodicalIF":3.9,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-05eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1691056
Jinnan Hu, Donald Freed, Hanying Feng, Hong Chen, Zhipan Li, Haodong Chen
Background: Integrating short- and long-read sequencing technologies has become a promising approach for achieving accurate and comprehensive genomic analysis. Although short-read sequencing (Illumina, etc.) offers high base accuracy and cost efficiency, it struggles with structural variant (SV) detection and complex genomic regions. In contrast, long-read sequencing (PacBio HiFi) excels in resolving large SVs and repetitive sequences but is limited by throughput, higher insertion or deletion (indel) error rates, and sequencing costs. Hybrid approaches may combine these technologies and leverage their complementary strengths and different sources of error to provide higher accuracy, more comprehensive results, and higher throughput by lowering the coverage requirement for the long reads.
Methods: This study benchmarks the DNAscope Hybrid (DS-Hybrid) pipeline, a novel integrated alignment and variant calling framework that combines short- and long-read data sequenced from the same sample. The DNAscope Hybrid pipeline is a bioinformatics pipeline that runs on generic x86 CPUs. We evaluate its performance across multiple human genome reference datasets (HG002-HG004) using the draft Q100 and Genome in a Bottle v4.2.1 benchmarks. The pipeline's ability to detect small variants [single-nucleotide polymorphisms (SNPs)/indels)], SVs, and copy-number variations (CNVs) is assessed using data from the Illumina and PacBio sequencing systems at varying read depths (5×-30×). Benchmark results are compared to those of DeepVariant.
Results: The DNAscope Hybrid pipeline significantly improves SNP and indel calling accuracy, particularly in complex genomic regions. At lower long-read depths (e.g., 5×-10×), the hybrid approach outperforms stand-alone short- or long-read pipelines at full sequencing depths (30×-35×), reducing variant calling errors by at least 50%. Additionally, the DNAscope Hybrid outperforms leading open-source tools for SV and CNV detection and enhances variant discovery in challenging genomic regions. The pipeline also demonstrates clinical utility by identifying variants in disease-associated genes. Moreover, DNAscope Hybrid is highly efficient, achieving less than 90 min runtimes at single standard instance.
Conclusion: The DNAscope Hybrid pipeline is a computationally efficient, highly accurate variant calling framework that leverages the advantages of both short- and long-read sequencing. By improving variant detection in challenging genomic regions and offering a robust solution for clinical and large-scale genomic applications, it holds significant promise for genetic disease diagnostics, population-scale studies, and personalized medicine.
背景:整合短读和长读测序技术已经成为实现准确和全面的基因组分析的一种有前途的方法。虽然短读测序(Illumina等)提供了高碱基精度和成本效率,但它在结构变异(SV)检测和复杂基因组区域方面存在困难。相比之下,长读测序(PacBio HiFi)在解决大的sv和重复序列方面表现出色,但受到吞吐量、较高的插入或删除(indel)错误率和测序成本的限制。混合方法可以结合这些技术,并利用它们的互补优势和不同的错误来源,通过降低长读取的覆盖要求来提供更高的准确性、更全面的结果和更高的吞吐量。方法:本研究对DNAscope Hybrid (DS-Hybrid)管道进行基准测试,这是一种新颖的集成比对和变体调用框架,结合了来自同一样本的短读和长读数据测序。DNAscope混合管道是一个运行在通用x86 cpu上的生物信息学管道。我们使用Q100草案和genome in a Bottle v4.2.1基准评估了其在多个人类基因组参考数据集(HG002-HG004)上的性能。该管线检测小变异[单核苷酸多态性(snp)/indels)]、SVs和拷贝数变异(cnv)的能力是使用Illumina和PacBio测序系统在不同读取深度下的数据进行评估的(5×-30×)。基准测试结果与DeepVariant的结果进行了比较。结果:DNAscope杂交管道显著提高了SNP和indel调用的准确性,特别是在复杂的基因组区域。在较低的长读深度(例如,5×-10×)下,混合方法在全测序深度(30×-35×)下优于独立的短读或长读管道,将变体调用错误减少至少50%。此外,DNAscope Hybrid在SV和CNV检测方面优于领先的开源工具,并增强了在具有挑战性的基因组区域的变异发现。该管道还通过识别疾病相关基因的变异证明了其临床实用性。此外,DNAscope Hybrid非常高效,单个标准实例的运行时间不到90分钟。结论:DNAscope杂交管道是一个计算效率高,高度准确的变体调用框架,利用了短读和长读测序的优势。通过改进具有挑战性的基因组区域的变异检测,并为临床和大规模基因组应用提供强大的解决方案,它在遗传病诊断、人群规模研究和个性化医疗方面具有重要的前景。
{"title":"A novel and accelerated method for integrated alignment and variant calling from short and long reads.","authors":"Jinnan Hu, Donald Freed, Hanying Feng, Hong Chen, Zhipan Li, Haodong Chen","doi":"10.3389/fbinf.2025.1691056","DOIUrl":"10.3389/fbinf.2025.1691056","url":null,"abstract":"<p><strong>Background: </strong>Integrating short- and long-read sequencing technologies has become a promising approach for achieving accurate and comprehensive genomic analysis. Although short-read sequencing (Illumina, etc.) offers high base accuracy and cost efficiency, it struggles with structural variant (SV) detection and complex genomic regions. In contrast, long-read sequencing (PacBio HiFi) excels in resolving large SVs and repetitive sequences but is limited by throughput, higher insertion or deletion (indel) error rates, and sequencing costs. Hybrid approaches may combine these technologies and leverage their complementary strengths and different sources of error to provide higher accuracy, more comprehensive results, and higher throughput by lowering the coverage requirement for the long reads.</p><p><strong>Methods: </strong>This study benchmarks the DNAscope Hybrid (DS-Hybrid) pipeline, a novel integrated alignment and variant calling framework that combines short- and long-read data sequenced from the same sample. The DNAscope Hybrid pipeline is a bioinformatics pipeline that runs on generic x86 CPUs. We evaluate its performance across multiple human genome reference datasets (HG002-HG004) using the draft Q100 and Genome in a Bottle v4.2.1 benchmarks. The pipeline's ability to detect small variants [single-nucleotide polymorphisms (SNPs)/indels)], SVs, and copy-number variations (CNVs) is assessed using data from the Illumina and PacBio sequencing systems at varying read depths (5×-30×). Benchmark results are compared to those of DeepVariant.</p><p><strong>Results: </strong>The DNAscope Hybrid pipeline significantly improves SNP and indel calling accuracy, particularly in complex genomic regions. At lower long-read depths (e.g., 5×-10×), the hybrid approach outperforms stand-alone short- or long-read pipelines at full sequencing depths (30×-35×), reducing variant calling errors by at least 50%. Additionally, the DNAscope Hybrid outperforms leading open-source tools for SV and CNV detection and enhances variant discovery in challenging genomic regions. The pipeline also demonstrates clinical utility by identifying variants in disease-associated genes. Moreover, DNAscope Hybrid is highly efficient, achieving less than 90 min runtimes at single standard instance.</p><p><strong>Conclusion: </strong>The DNAscope Hybrid pipeline is a computationally efficient, highly accurate variant calling framework that leverages the advantages of both short- and long-read sequencing. By improving variant detection in challenging genomic regions and offering a robust solution for clinical and large-scale genomic applications, it holds significant promise for genetic disease diagnostics, population-scale studies, and personalized medicine.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1691056"},"PeriodicalIF":3.9,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-02eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1719516
Susanne Zabel, Philipp Hennig, Kay Nieselt
t-distributed Stochastic Neighbour Embedding (t-SNE) is a cornerstone for visualizing high-dimensional biological data, where each high-dimensional data point is represented as a point in a two-dimensional map. However, this static map provides no information about the stability of the visual layout, the features that influence it, or the impact of uncertainty in the input data. This work introduces a computational framework that allows one to extend the standard t-SNE plot by visual clues about the stability of the t-SNE embedding. First, we perform a sensitivity analysis to determine feature influence: by combining the Implicit Function Theorem with automatic differentiation, our method computes the sensitivity of the embedding w.r.t. the input data, provided in a Jacobian of first-order derivatives. Heatmap-visualizations of this Jacobian or summarizations thereof reveal which input features are most influential in shaping the embedding and identifying regions of structural instability. Second, when input data uncertainty is available, our framework uses this Jacobian to propagate error, probabilistically quantifying the positional uncertainty of each embedded point. This uncertainty is visualized by augmenting the plot with hypothetical outcomes, which display the positional confidence of each point. We apply our framework to three diverse biological datasets (bulk RNA-seq, proteomics, and single-cell transcriptomics), demonstrating its ability to directly link visual patterns to their underlying biological drivers and reveal ambiguities invisible in a standard plot. By providing this principled means to assess the robustness and interpretability of t-SNE visualizations, our work enables more rigorous and informed scientific conclusions in bioinformatics.
{"title":"Visualizing stability: a sensitivity analysis framework for t-SNE embeddings.","authors":"Susanne Zabel, Philipp Hennig, Kay Nieselt","doi":"10.3389/fbinf.2025.1719516","DOIUrl":"10.3389/fbinf.2025.1719516","url":null,"abstract":"<p><p>t-distributed Stochastic Neighbour Embedding (t-SNE) is a cornerstone for visualizing high-dimensional biological data, where each high-dimensional data point is represented as a point in a two-dimensional map. However, this static map provides no information about the stability of the visual layout, the features that influence it, or the impact of uncertainty in the input data. This work introduces a computational framework that allows one to extend the standard t-SNE plot by visual clues about the stability of the t-SNE embedding. First, we perform a sensitivity analysis to determine feature influence: by combining the Implicit Function Theorem with automatic differentiation, our method computes the sensitivity of the embedding w.r.t. the input data, provided in a Jacobian of first-order derivatives. Heatmap-visualizations of this Jacobian or summarizations thereof reveal which input features are most influential in shaping the embedding and identifying regions of structural instability. Second, when input data uncertainty is available, our framework uses this Jacobian to propagate error, probabilistically quantifying the positional uncertainty of each embedded point. This uncertainty is visualized by augmenting the plot with hypothetical outcomes, which display the positional confidence of each point. We apply our framework to three diverse biological datasets (bulk RNA-seq, proteomics, and single-cell transcriptomics), demonstrating its ability to directly link visual patterns to their underlying biological drivers and reveal ambiguities invisible in a standard plot. By providing this principled means to assess the robustness and interpretability of t-SNE visualizations, our work enables more rigorous and informed scientific conclusions in bioinformatics.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1719516"},"PeriodicalIF":3.9,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12808344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-19eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1735106
Loganathan Chandramani Priya Dharshini, Abul Kalam Azad Mandal
Background: Triple-negative breast cancer (TNBC) is defined by the absence of ER, PR, and HER2 expression. This limits the targeted therapies, resulting in poor clinical outcomes. Identifying the molecular targets that can be regulated through miRNAs and natural compounds offers a potential therapeutic platform.
Methods: We combined transcriptomic profiling with miRNA target prediction to identify genes regulated by miR-30a-5p and assess their interaction with the green tea polyphenol, epigallocatechin gallate (EGCG). Differentially expressed genes (DEGs) from TCGA-TNBC datasets and miRNA targets from miRDB, TargetScan, and miRTarBase were screened for common genes. Then, the protein-protein interaction and network topology analyses were performed to identify key hub genes. Molecular docking and simulation were carried out with the four key genes against EGCG.
Results: Data integration yielded 393 overlapping genes and identified ten hub genes- RRM2, KIF11, ANLN, CDC20, CCNA1, AGO2, YWHAZ, DTL, SKP2, and PCNA. Pathway enrichment showed that all these hubs are involved in cell cycle and mitotic regulation, which was associated with poor TNBC prognosis. Mutation profiling revealed high alteration rates in KIF11, ANLN, CDC20, and YWHAZ, with increased missense mutations and C>T transitions. Molecular docking and simulations identified YWHAZ as the most favorable and structurally stable EGCG-binding target, compared to the other three key genes.
Conclusion: The results emphasizes that EGCG has strong binding affinity towards YWHAZ, revealing that miR-30a-EGCG targets TNBC synergistically through cell-cycle-mediated pathways. The findings give rational support for miRNA-guided phytochemical-based TNBC therapeutic development.
{"title":"Network-based insights into miR-30a-5p-mediated regulation and EGCG targeting in triple-negative breast cancer.","authors":"Loganathan Chandramani Priya Dharshini, Abul Kalam Azad Mandal","doi":"10.3389/fbinf.2025.1735106","DOIUrl":"10.3389/fbinf.2025.1735106","url":null,"abstract":"<p><strong>Background: </strong>Triple-negative breast cancer (TNBC) is defined by the absence of ER, PR, and HER2 expression. This limits the targeted therapies, resulting in poor clinical outcomes. Identifying the molecular targets that can be regulated through miRNAs and natural compounds offers a potential therapeutic platform.</p><p><strong>Methods: </strong>We combined transcriptomic profiling with miRNA target prediction to identify genes regulated by miR-30a-5p and assess their interaction with the green tea polyphenol, epigallocatechin gallate (EGCG). Differentially expressed genes (DEGs) from TCGA-TNBC datasets and miRNA targets from miRDB, TargetScan, and miRTarBase were screened for common genes. Then, the protein-protein interaction and network topology analyses were performed to identify key hub genes. Molecular docking and simulation were carried out with the four key genes against EGCG.</p><p><strong>Results: </strong>Data integration yielded 393 overlapping genes and identified ten hub genes- <i>RRM2</i>, <i>KIF11</i>, <i>ANLN</i>, <i>CDC20</i>, <i>CCNA1</i>, <i>AGO2</i>, <i>YWHAZ</i>, <i>DTL</i>, <i>SKP2</i>, and <i>PCNA</i>. Pathway enrichment showed that all these hubs are involved in cell cycle and mitotic regulation, which was associated with poor TNBC prognosis. Mutation profiling revealed high alteration rates in <i>KIF11</i>, <i>ANLN, CDC20</i>, and <i>YWHAZ</i>, with increased missense mutations and C>T transitions. Molecular docking and simulations identified <i>YWHAZ</i> as the most favorable and structurally stable EGCG-binding target, compared to the other three key genes.</p><p><strong>Conclusion: </strong>The results emphasizes that EGCG has strong binding affinity towards YWHAZ, revealing that miR-30a-EGCG targets TNBC synergistically through cell-cycle-mediated pathways. The findings give rational support for miRNA-guided phytochemical-based TNBC therapeutic development.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1735106"},"PeriodicalIF":3.9,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12757378/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145901661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1708800
Rubén Fernández, L Francisco Lorenzo-Martín, Víctor Quesada, Xosé R Bustelo
RHO family GTPases are key regulators of cancer-related processes such as cytoskeletal dynamics and cell migration, proliferation, and survival. Despite this, a comprehensive understanding of RHO signaling alterations across tumors is still lacking. In this study, we present a pan-cancer analysis of 484 genes encoding RHO GTPases, regulators, proximal effectors, distal downstream signaling elements, and components of their proximal interactomes using data from over 10,000 tumor samples and 33 tumor types present in The Cancer Genome Atlas (TCGA). In addition, we have utilized available data from genome-wide functional dependency screens performed in more than 1,000 gene-edited cancer cell lines. This study has uncovered positively selected mutations in both well-known and previously uncharacterized RHO pathway genes. Transcriptomic profiling reveals widespread and tumor-specific differential expression patterns, with some of them correlating with copy number changes. Interestingly, certain regulators exhibit consistent expression profiles across tumors opposite to those predicted from their canonical roles. Co-expression and gene set enrichment analyses highlight coordinated transcriptional programs involving some RHO GTPase pathway genes and their linkage to key cancer hallmarks, including extracellular matrix reorganization, cell motility, cell cycle progression, cell survival, and immune modulation. Functional screens further identify context-specific dependencies on several deregulated RHO GTPase pathway genes. Altogether, this study provides a comprehensive map of RHO GTPase pathway alterations in cancer and identifies new oncogenic drivers, expression-based signatures, and therapeutic vulnerabilities that could guide future mechanistic and translational research in this area.
{"title":"Pan-cancer analyses identify oncogenic drivers, expression signatures, and therapeutic vulnerabilities in RHO GTPase pathway genes.","authors":"Rubén Fernández, L Francisco Lorenzo-Martín, Víctor Quesada, Xosé R Bustelo","doi":"10.3389/fbinf.2025.1708800","DOIUrl":"10.3389/fbinf.2025.1708800","url":null,"abstract":"<p><p>RHO family GTPases are key regulators of cancer-related processes such as cytoskeletal dynamics and cell migration, proliferation, and survival. Despite this, a comprehensive understanding of RHO signaling alterations across tumors is still lacking. In this study, we present a pan-cancer analysis of 484 genes encoding RHO GTPases, regulators, proximal effectors, distal downstream signaling elements, and components of their proximal interactomes using data from over 10,000 tumor samples and 33 tumor types present in The Cancer Genome Atlas (TCGA). In addition, we have utilized available data from genome-wide functional dependency screens performed in more than 1,000 gene-edited cancer cell lines. This study has uncovered positively selected mutations in both well-known and previously uncharacterized RHO pathway genes. Transcriptomic profiling reveals widespread and tumor-specific differential expression patterns, with some of them correlating with copy number changes. Interestingly, certain regulators exhibit consistent expression profiles across tumors opposite to those predicted from their canonical roles. Co-expression and gene set enrichment analyses highlight coordinated transcriptional programs involving some RHO GTPase pathway genes and their linkage to key cancer hallmarks, including extracellular matrix reorganization, cell motility, cell cycle progression, cell survival, and immune modulation. Functional screens further identify context-specific dependencies on several deregulated RHO GTPase pathway genes. Altogether, this study provides a comprehensive map of RHO GTPase pathway alterations in cancer and identifies new oncogenic drivers, expression-based signatures, and therapeutic vulnerabilities that could guide future mechanistic and translational research in this area.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1708800"},"PeriodicalIF":3.9,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12753894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Objective: Knee osteoarthritis (KOA) is a prevalent chronic degenerative joint disease that causes chronic pain and mobility restrictions in the elderly, significantly impacting quality of life. Current treatments focus on symptom relief, lacking effective interventions targeting the underlying mechanisms. Understanding KOA's molecular mechanisms and identifying key pathogenic genes are essential for developing targeted therapies.
Methods: Gene expression data from KOA patients and healthy controls were obtained from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were identified. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed to reveal the associated biological processes and signaling pathways. Protein-protein interaction (PPI) network analysis and Gene Ontology-based semantic similarity calculations were used to identify hub genes. Gene Set Variation Analysis (GSVA) assessed enrichment in KOA-related pathways. Immune infiltration analysis (CIBERSORT) assessed the immune cell distribution in KOA tissues. Finally, hub gene expression changes were validated using the IL-1β-treated CHON-001 cell model and real-time quantitative PCR (RT-qPCR).
Results: A total of 3,290 upregulated and 2,536 downregulated DEGs were identified. GO and KEGG enrichment analyses revealed these genes were primarily involved in extracellular matrix remodeling, transmembrane transport, and inflammation-related pathways. Key hub genes, including HSPA5, FOXO1, and YWHAE, were identified. GSVA showed that these genes were significantly enriched in multiple KOA-associated signaling pathways. Immune infiltration analysis revealed significant differences in the levels of six immune cell types in KOA tissues, which were associated with the hub genes expression. In CHON-001 cell, the expression levels of GRB2, IKBKG, and HSPA12A were upregulated, whereas YWHAE, HSPB1, and DCAF8 were downregulated, consistent with the tissue samples.
Conclusion: This study identified key pathogenic genes and their regulatory pathways in KOA, highlighting their potential role in disease progression via inflammation and immune modulation. These findings provide insights for developing targeted therapeutic strategies for KOA.
目的:膝关节骨性关节炎(KOA)是一种常见的慢性退行性关节疾病,导致老年人慢性疼痛和活动受限,严重影响生活质量。目前的治疗侧重于症状缓解,缺乏针对潜在机制的有效干预措施。了解KOA的分子机制和确定关键致病基因对开发靶向治疗至关重要。方法:从Gene expression Omnibus (GEO)数据库中获取KOA患者和健康对照者的基因表达数据,鉴定差异表达基因(differential expression genes, DEGs)。基因本体(GO)和京都基因与基因组百科全书(KEGG)富集分析揭示了相关的生物学过程和信号通路。利用蛋白质-蛋白质相互作用(PPI)网络分析和基于基因本体的语义相似度计算来识别中心基因。基因集变异分析(GSVA)评估了koa相关通路的富集程度。免疫浸润分析(CIBERSORT)评估KOA组织中免疫细胞的分布。最后,利用il -1β处理的CHON-001细胞模型和实时定量PCR (RT-qPCR)验证hub基因表达变化。结果:共鉴定出3290个上调的deg和2536个下调的deg。GO和KEGG富集分析显示,这些基因主要参与细胞外基质重塑、跨膜运输和炎症相关途径。鉴定出关键枢纽基因,包括HSPA5、fox01和YWHAE。GSVA显示这些基因在多个koa相关信号通路中显著富集。免疫浸润分析显示,KOA组织中6种免疫细胞类型的水平存在显著差异,这些免疫细胞类型与枢纽基因的表达有关。在CHON-001细胞中,GRB2、IKBKG和HSPA12A的表达水平上调,而YWHAE、HSPB1和DCAF8的表达水平下调,与组织样本一致。结论:本研究确定了KOA的关键致病基因及其调控途径,强调了它们通过炎症和免疫调节在疾病进展中的潜在作用。这些发现为开发针对KOA的靶向治疗策略提供了见解。
{"title":"Identification and functional analysis of hub genes in knee osteoarthritis via bioinformatics and experimental validation.","authors":"Shanyong Jiang, Jingjing Cao, Jianshu Lu, Jianxiao Liang, Lianxin Li, Yanqiang Song, Jincheng Gao, Baoen Jiang","doi":"10.3389/fbinf.2025.1671693","DOIUrl":"10.3389/fbinf.2025.1671693","url":null,"abstract":"<p><strong>Objective: </strong>Knee osteoarthritis (KOA) is a prevalent chronic degenerative joint disease that causes chronic pain and mobility restrictions in the elderly, significantly impacting quality of life. Current treatments focus on symptom relief, lacking effective interventions targeting the underlying mechanisms. Understanding KOA's molecular mechanisms and identifying key pathogenic genes are essential for developing targeted therapies.</p><p><strong>Methods: </strong>Gene expression data from KOA patients and healthy controls were obtained from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were identified. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed to reveal the associated biological processes and signaling pathways. Protein-protein interaction (PPI) network analysis and Gene Ontology-based semantic similarity calculations were used to identify hub genes. Gene Set Variation Analysis (GSVA) assessed enrichment in KOA-related pathways. Immune infiltration analysis (CIBERSORT) assessed the immune cell distribution in KOA tissues. Finally, hub gene expression changes were validated using the IL-1β-treated CHON-001 cell model and real-time quantitative PCR (RT-qPCR).</p><p><strong>Results: </strong>A total of 3,290 upregulated and 2,536 downregulated DEGs were identified. GO and KEGG enrichment analyses revealed these genes were primarily involved in extracellular matrix remodeling, transmembrane transport, and inflammation-related pathways. Key hub genes, including HSPA5, FOXO1, and YWHAE, were identified. GSVA showed that these genes were significantly enriched in multiple KOA-associated signaling pathways. Immune infiltration analysis revealed significant differences in the levels of six immune cell types in KOA tissues, which were associated with the hub genes expression. In CHON-001 cell, the expression levels of GRB2, IKBKG, and HSPA12A were upregulated, whereas YWHAE, HSPB1, and DCAF8 were downregulated, consistent with the tissue samples.</p><p><strong>Conclusion: </strong>This study identified key pathogenic genes and their regulatory pathways in KOA, highlighting their potential role in disease progression via inflammation and immune modulation. These findings provide insights for developing targeted therapeutic strategies for KOA.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1671693"},"PeriodicalIF":3.9,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12753952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-16eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1676359
Innocent Sibanda, Geoff Nitschke
The goal of bioengineering in synthetic biology is to redesign, reprogram, and rewire biological systems for specific applications using standardized parts such as promoters and ribosomes. For example, bioengineered micro-organisms capable of cleaning up environmental pollution or producing antibodies de novo to defend against viral pandemics have been predicted. Artificial Life (ALife) facilitates the design and understanding of living systems, not just those found in nature, but life as it could be, while synthetic biology provides the means to realize life as it can be engineered. Despite significant advances, the synthesis of evolving, adaptable, and bioengineered problem-solving ALife has yet to achieve practical feasibility. This is primarily due to limitations in directed evolution, fitness landscape mapping, and fitness approximation. Thus, currently synthetic (biological) ALife does not continue to evolve and adapt to changing tasks and environments. This is in stark contrast to current digital based ALife that continues to adapt and evolve in simulated environments demonstrating the dictum of life as it could be. We posit that if the bioengineering (on-demand design) of problem solving ALife is to ever become a reality then open issues pervading the directed evolution of synthetic ALife must first be addressed. This review examines open challenges in directed evolution, genetic diversity generation, fitness mapping, and fitness estimation, and outlines future directions toward a hybrid synthetic ALife design methodology. This review provides a novel perspective for a singular (hybridized) evolutionary design methodology, combining digital (in silico) and synthetic (in vitro) evolutionary design methods drawn from various bioengineering, digital and robotic ALife applications, while addressing highlighted directed evolution deficiencies.
{"title":"Bioengineering hybrid artificial life.","authors":"Innocent Sibanda, Geoff Nitschke","doi":"10.3389/fbinf.2025.1676359","DOIUrl":"10.3389/fbinf.2025.1676359","url":null,"abstract":"<p><p>The goal of bioengineering in synthetic biology is to redesign, reprogram, and rewire biological systems for specific applications using standardized parts such as promoters and ribosomes. For example, bioengineered micro-organisms capable of cleaning up environmental pollution or producing antibodies <i>de novo</i> to defend against viral pandemics have been predicted. Artificial Life (ALife) facilitates the design and understanding of living systems, not just those found in nature, but <i>life as it could be</i>, while synthetic biology provides the means to realize <i>life as it can be engineered.</i> Despite significant advances, the synthesis of evolving, adaptable, and bioengineered problem-solving ALife has yet to achieve practical feasibility. This is primarily due to limitations in directed evolution, fitness landscape mapping, and fitness approximation. Thus, currently synthetic (biological) ALife does not continue to evolve and adapt to changing tasks and environments. This is in stark contrast to current digital based ALife that continues to adapt and evolve in simulated environments demonstrating the dictum of <i>life as it could be</i>. We posit that if the bioengineering (on-demand design) of problem solving ALife is to ever become a reality then open issues pervading the directed evolution of synthetic ALife must first be addressed. This review examines open challenges in directed evolution, genetic diversity generation, fitness mapping, and fitness estimation, and outlines future directions toward a hybrid synthetic ALife design methodology. This review provides a novel perspective for a singular (hybridized) evolutionary design methodology, combining digital <i>(in silico)</i> and synthetic <i>(in vitro)</i> evolutionary design methods drawn from various bioengineering, digital and robotic ALife applications, while addressing highlighted directed evolution deficiencies.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1676359"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12748196/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
TNBC is an aggressive and various subtype of breast cancer, notable by the lack of specific oestrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2), consequential in limited treatment options and poor prognosis. Kinesin Family Member C1 (KIFC1), a mitotic motor protein critical for centrosome clustering and spindle formation, has critical role in TNBC progress. In this situation, natural compounds were explored as probable inhibitors of this protein. we utilized molecular docking, ADMET profiling, density functional theory calculations, molecular dynamics simulations, MM/GBSA binding free energy analysis, and principal component analysis to thoroughly evaluate binding affinity, stability, and drug-likeness property of natural compounds against KIFC1. Of the 36,900 compounds utilized, five natural compounds were carefully chosen for further assessment. All five compounds Fosfocytocin, Molybdopterin Compound Z, 5-amino-2-(3-hydroxy-13-methyltetradecanamido) pentanoic acid, TMC-52A, and Muscimol exhibited significant inhibitory efficacy against KIFC1. These compounds demonstrated persistent interactions with critical residues and had advantageous binding properties in computational evaluations. The results collectively indicate their potential as effective inhibitors for targeting KIFC1 in forthcoming studies. These data collectively identify all five natural compounds as possible inhibitors of KIFC1. Nonetheless, their effectiveness and safety must be confirmed through in vivo and in vitro study prior to consideration for clinical application.
{"title":"In silico identification of novel natural compounds as potential KIFC1 inhibitors for the therapeutic intervention of triple-negative breast cancer.","authors":"Prashant Kumar Tiwari, Mukesh Kumar, Richa Mishra, Xiaomeng Zhang, Sanjay Kumar","doi":"10.3389/fbinf.2025.1689172","DOIUrl":"10.3389/fbinf.2025.1689172","url":null,"abstract":"<p><p>TNBC is an aggressive and various subtype of breast cancer, notable by the lack of specific oestrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2), consequential in limited treatment options and poor prognosis. Kinesin Family Member C1 (KIFC1), a mitotic motor protein critical for centrosome clustering and spindle formation, has critical role in TNBC progress. In this situation, natural compounds were explored as probable inhibitors of this protein. we utilized molecular docking, ADMET profiling, density functional theory calculations, molecular dynamics simulations, MM/GBSA binding free energy analysis, and principal component analysis to thoroughly evaluate binding affinity, stability, and drug-likeness property of natural compounds against KIFC1. Of the 36,900 compounds utilized, five natural compounds were carefully chosen for further assessment. All five compounds Fosfocytocin, Molybdopterin Compound Z, 5-amino-2-(3-hydroxy-13-methyltetradecanamido) pentanoic acid, TMC-52A, and Muscimol exhibited significant inhibitory efficacy against KIFC1. These compounds demonstrated persistent interactions with critical residues and had advantageous binding properties in computational evaluations. The results collectively indicate their potential as effective inhibitors for targeting KIFC1 in forthcoming studies. These data collectively identify all five natural compounds as possible inhibitors of KIFC1. Nonetheless, their effectiveness and safety must be confirmed through <i>in vivo</i> and <i>in vitro</i> study prior to consideration for clinical application.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1689172"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12748000/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11eCollection Date: 2025-01-01DOI: 10.3389/fbinf.2025.1684227
Yuya Sato, Toru Asahi, Kosuke Kataoka
Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework's practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.
{"title":"Celline: a flexible tool for one-step retrieval and integrative analysis of public single-cell RNA sequencing data.","authors":"Yuya Sato, Toru Asahi, Kosuke Kataoka","doi":"10.3389/fbinf.2025.1684227","DOIUrl":"10.3389/fbinf.2025.1684227","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework's practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1684227"},"PeriodicalIF":3.9,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12738925/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145851856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}