Frontiers in bioinformatics最新文献_第3页

Stain-free artificial intelligence-assisted light microscopy for the identification of leukocyte morphology change in presence of bacteria. 无染色人工智能辅助光学显微镜用于鉴定细菌存在时白细胞形态的变化。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2026-01-06 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1725145

Alexander Hunt, Holger Schulze, Kay Samuel, Robert B Fisher, Till T Bachmann

Background: Rapid detection of bacterial infections through leukocyte activation analysis could significantly reduce diagnostic timeframes from days to hours. Traditional methods like flow cytometry and biomarker assays face limitations including technical complexity, equipment requirements, and delayed results.

Methods: We developed a dual artificial neural network system combining stain-free light microscopy with microfluidic technology to detect morphological changes in activated leukocytes. YOLOv4 networks were trained using five-fold cross-validation on images of chemically stimulated leukocyte subpopulations (lymphocytes, monocytes, and neutrophils) and validated against flow cytometry. The system was tested on whole blood samples spiked with E. coli at clinically relevant concentrations (10-250 CFU/mL).

Results: The optimized four-class network achieved high performance metrics for lymphocytes (F1 score: 80.1% ± 2.5%) and neutrophils (F1 score: 91.7% ± 1.7%), while a specialized binary classifier for monocytes achieved 97.0% ± 2.8% F1 score. In bacteria-spiked whole blood experiments, the system successfully detected activated leukocytes within 30 min at concentrations approaching clinical blood culture detection limits (11.11 ± 4.79 CFU/mL). Neutrophils showed rapid activation peaking at 1-3 h, while lymphocyte activation increased gradually over 6-12 h, consistent with innate versus adaptive immune response kinetics.

Conclusion: This AI-assisted microscopy platform enables rapid, label-free detection of leukocyte activation in response to bacterial infection with minimal sample handling and no requirement for specialized staining or trained technicians. The technology demonstrates potential for accelerating infection diagnosis and could be extended to other pathologies with morphological manifestations.

背景：通过白细胞活化分析快速检测细菌感染可以将诊断时间从几天缩短到几小时。流式细胞术和生物标志物等传统方法面临技术复杂性、设备要求和延迟结果等局限性。方法：建立了一种双人工神经网络系统，结合无染色光学显微镜和微流体技术检测活化白细胞的形态变化。YOLOv4网络在化学刺激的白细胞亚群（淋巴细胞、单核细胞和中性粒细胞）图像上使用五倍交叉验证进行训练，并通过流式细胞术进行验证。该系统在含有临床相关浓度（10-250 CFU/mL）的大肠杆菌的全血样本上进行了测试。结果：优化后的四类网络在淋巴细胞（F1评分：80.1%±2.5%）和中性粒细胞（F1评分：91.7%±1.7%）方面取得了较高的性能指标，而单核细胞的专门二元分类器的F1评分为97.0%±2.8%。在细菌加标全血实验中，该系统在30分钟内成功检测到活性白细胞，浓度接近临床血培养检测限（11.11±4.79 CFU/mL）。中性粒细胞的激活在1-3 h达到峰值，而淋巴细胞的激活在6-12 h逐渐增加，符合先天与适应性免疫反应动力学。结论：这种人工智能辅助显微镜平台能够快速、无标记地检测细菌感染时的白细胞活化，只需最少的样品处理，不需要专门的染色或训练有素的技术人员。该技术显示了加速感染诊断的潜力，并可扩展到其他具有形态学表现的病理。

{"title":"Stain-free artificial intelligence-assisted light microscopy for the identification of leukocyte morphology change in presence of bacteria.","authors":"Alexander Hunt, Holger Schulze, Kay Samuel, Robert B Fisher, Till T Bachmann","doi":"10.3389/fbinf.2025.1725145","DOIUrl":"10.3389/fbinf.2025.1725145","url":null,"abstract":"Background: Rapid detection of bacterial infections through leukocyte activation analysis could significantly reduce diagnostic timeframes from days to hours. Traditional methods like flow cytometry and biomarker assays face limitations including technical complexity, equipment requirements, and delayed results.Methods: We developed a dual artificial neural network system combining stain-free light microscopy with microfluidic technology to detect morphological changes in activated leukocytes. YOLOv4 networks were trained using five-fold cross-validation on images of chemically stimulated leukocyte subpopulations (lymphocytes, monocytes, and neutrophils) and validated against flow cytometry. The system was tested on whole blood samples spiked with E. coli at clinically relevant concentrations (10-250 CFU/mL).Results: The optimized four-class network achieved high performance metrics for lymphocytes (F1 score: 80.1% ± 2.5%) and neutrophils (F1 score: 91.7% ± 1.7%), while a specialized binary classifier for monocytes achieved 97.0% ± 2.8% F1 score. In bacteria-spiked whole blood experiments, the system successfully detected activated leukocytes within 30 min at concentrations approaching clinical blood culture detection limits (11.11 ± 4.79 CFU/mL). Neutrophils showed rapid activation peaking at 1-3 h, while lymphocyte activation increased gradually over 6-12 h, consistent with innate versus adaptive immune response kinetics.Conclusion: This AI-assisted microscopy platform enables rapid, label-free detection of leukocyte activation in response to bacterial infection with minimal sample handling and no requirement for specialized staining or trained technicians. The technology demonstrates potential for accelerating infection diagnosis and could be extended to other pathologies with morphological manifestations.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1725145"},"PeriodicalIF":3.9,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12816271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146020747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Transformers-based framework for refinement of genetic variants. 基于transformer的遗传变异细化框架。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2026-01-05 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1694924

Omar Abdelwahab, Davoud Torkamaneh

Accurate variant calling refinement is crucial for distinguishing true genetic variants from technical artifacts in high-throughput sequencing data. While heuristic filtering and manual review are common approaches for refining variants, manual review is time-consuming, and heuristic filtering often lacks optimal solutions, especially for low-coverage data. Traditional variant calling methods often struggle with accuracy, especially in regions of low read coverage, leading to false-positive or false-negative calls. Advances in artificial intelligence, particularly deep learning, offer promising solutions for automating this refinement process. Here, we present a Transformers-based framework for genetic variant refinement that leverages self-attention to model dependencies among variant features and directly processes VCF files, enabling seamless integration with standard pipelines such as BCFTools and GATK4. Trained on 2 million variants from the GIAB (v4.2.1) sample HG003, the framework achieved 89.26% accuracy and a ROC AUC of 0.88. Across the tested samples, VariantTransformer improved baseline filtering accuracy by 4%-10%, demonstrating consistent gains over the default caller filters. When integrated into conventional variant calling pipelines, VariantTransformer outperformed traditional heuristic filters and, through refinement of existing caller outputs, approached the accuracy achieved by state-of-the-art AI-based variant callers such as DeepVariant, despite not operating as a standalone caller. By positioning this work as a flexible and generalizable framework rather than a single-use model, we highlight the underexplored potential of Transformers for variant refinement in genomics. This study contributes a blueprint for adapting Transformer architectures to a wide range of genomic quality control and filtering tasks. Code is available at: https://github.com/Omar-Abd-Elwahab/VariantTransformer.

在高通量测序数据中，准确的变异调用细化对于区分真正的遗传变异和技术产物至关重要。虽然启发式过滤和手动审查是精炼变量的常用方法，但手动审查非常耗时，并且启发式过滤通常缺乏最佳解决方案，特别是对于低覆盖率数据。传统的变体调用方法在精度上存在一定的问题，特别是在低读覆盖区域，会导致误报或误报调用。人工智能的进步，特别是深度学习，为自动化这一优化过程提供了有希望的解决方案。在这里，我们提出了一个基于transformer的遗传变异改进框架，该框架利用了对变体特征之间模型依赖关系的自我关注，并直接处理VCF文件，从而实现了与标准管道（如BCFTools和GATK4）的无缝集成。该框架对来自GIAB （v4.2.1）样本HG003的200万个变体进行了训练，准确率达到89.26%，ROC AUC为0.88。在测试的样本中，VariantTransformer将基线过滤精度提高了4%-10%，与默认调用者过滤器相比显示出一致的增益。当集成到传统的变体调用管道中时，VariantTransformer优于传统的启发式过滤器，并且通过对现有调用者输出的改进，接近最先进的基于ai的变体调用者（如DeepVariant）所达到的精度，尽管不作为独立的调用者运行。通过将这项工作定位为一个灵活和可推广的框架，而不是单一使用的模型，我们强调了变形金刚在基因组学中变体改进的未充分开发的潜力。这项研究为使Transformer架构适应广泛的基因组质量控制和过滤任务提供了蓝图。代码可从https://github.com/Omar-Abd-Elwahab/VariantTransformer获得。

{"title":"A Transformers-based framework for refinement of genetic variants.","authors":"Omar Abdelwahab, Davoud Torkamaneh","doi":"10.3389/fbinf.2025.1694924","DOIUrl":"10.3389/fbinf.2025.1694924","url":null,"abstract":"Accurate variant calling refinement is crucial for distinguishing true genetic variants from technical artifacts in high-throughput sequencing data. While heuristic filtering and manual review are common approaches for refining variants, manual review is time-consuming, and heuristic filtering often lacks optimal solutions, especially for low-coverage data. Traditional variant calling methods often struggle with accuracy, especially in regions of low read coverage, leading to false-positive or false-negative calls. Advances in artificial intelligence, particularly deep learning, offer promising solutions for automating this refinement process. Here, we present a Transformers-based framework for genetic variant refinement that leverages self-attention to model dependencies among variant features and directly processes VCF files, enabling seamless integration with standard pipelines such as BCFTools and GATK4. Trained on 2 million variants from the GIAB (v4.2.1) sample HG003, the framework achieved 89.26% accuracy and a ROC AUC of 0.88. Across the tested samples, VariantTransformer improved baseline filtering accuracy by 4%-10%, demonstrating consistent gains over the default caller filters. When integrated into conventional variant calling pipelines, VariantTransformer outperformed traditional heuristic filters and, through refinement of existing caller outputs, approached the accuracy achieved by state-of-the-art AI-based variant callers such as DeepVariant, despite not operating as a standalone caller. By positioning this work as a flexible and generalizable framework rather than a single-use model, we highlight the underexplored potential of Transformers for variant refinement in genomics. This study contributes a blueprint for adapting Transformer architectures to a wide range of genomic quality control and filtering tasks. Code is available at: https://github.com/Omar-Abd-Elwahab/VariantTransformer.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1694924"},"PeriodicalIF":3.9,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813134/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel and accelerated method for integrated alignment and variant calling from short and long reads. 一种新颖的、快速的短、长读取集对和变体调用的方法。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2026-01-05 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1691056

Jinnan Hu, Donald Freed, Hanying Feng, Hong Chen, Zhipan Li, Haodong Chen

Background: Integrating short- and long-read sequencing technologies has become a promising approach for achieving accurate and comprehensive genomic analysis. Although short-read sequencing (Illumina, etc.) offers high base accuracy and cost efficiency, it struggles with structural variant (SV) detection and complex genomic regions. In contrast, long-read sequencing (PacBio HiFi) excels in resolving large SVs and repetitive sequences but is limited by throughput, higher insertion or deletion (indel) error rates, and sequencing costs. Hybrid approaches may combine these technologies and leverage their complementary strengths and different sources of error to provide higher accuracy, more comprehensive results, and higher throughput by lowering the coverage requirement for the long reads.

Methods: This study benchmarks the DNAscope Hybrid (DS-Hybrid) pipeline, a novel integrated alignment and variant calling framework that combines short- and long-read data sequenced from the same sample. The DNAscope Hybrid pipeline is a bioinformatics pipeline that runs on generic x86 CPUs. We evaluate its performance across multiple human genome reference datasets (HG002-HG004) using the draft Q100 and Genome in a Bottle v4.2.1 benchmarks. The pipeline's ability to detect small variants [single-nucleotide polymorphisms (SNPs)/indels)], SVs, and copy-number variations (CNVs) is assessed using data from the Illumina and PacBio sequencing systems at varying read depths (5×-30×). Benchmark results are compared to those of DeepVariant.

Results: The DNAscope Hybrid pipeline significantly improves SNP and indel calling accuracy, particularly in complex genomic regions. At lower long-read depths (e.g., 5×-10×), the hybrid approach outperforms stand-alone short- or long-read pipelines at full sequencing depths (30×-35×), reducing variant calling errors by at least 50%. Additionally, the DNAscope Hybrid outperforms leading open-source tools for SV and CNV detection and enhances variant discovery in challenging genomic regions. The pipeline also demonstrates clinical utility by identifying variants in disease-associated genes. Moreover, DNAscope Hybrid is highly efficient, achieving less than 90 min runtimes at single standard instance.

Conclusion: The DNAscope Hybrid pipeline is a computationally efficient, highly accurate variant calling framework that leverages the advantages of both short- and long-read sequencing. By improving variant detection in challenging genomic regions and offering a robust solution for clinical and large-scale genomic applications, it holds significant promise for genetic disease diagnostics, population-scale studies, and personalized medicine.

背景：整合短读和长读测序技术已经成为实现准确和全面的基因组分析的一种有前途的方法。虽然短读测序（Illumina等）提供了高碱基精度和成本效率，但它在结构变异（SV）检测和复杂基因组区域方面存在困难。相比之下，长读测序（PacBio HiFi）在解决大的sv和重复序列方面表现出色，但受到吞吐量、较高的插入或删除（indel）错误率和测序成本的限制。混合方法可以结合这些技术，并利用它们的互补优势和不同的错误来源，通过降低长读取的覆盖要求来提供更高的准确性、更全面的结果和更高的吞吐量。方法：本研究对DNAscope Hybrid （DS-Hybrid）管道进行基准测试，这是一种新颖的集成比对和变体调用框架，结合了来自同一样本的短读和长读数据测序。DNAscope混合管道是一个运行在通用x86 cpu上的生物信息学管道。我们使用Q100草案和genome in a Bottle v4.2.1基准评估了其在多个人类基因组参考数据集（HG002-HG004）上的性能。该管线检测小变异[单核苷酸多态性(snp)/indels)]、SVs和拷贝数变异（cnv）的能力是使用Illumina和PacBio测序系统在不同读取深度下的数据进行评估的（5×-30×）。基准测试结果与DeepVariant的结果进行了比较。结果：DNAscope杂交管道显著提高了SNP和indel调用的准确性，特别是在复杂的基因组区域。在较低的长读深度（例如，5×-10×）下，混合方法在全测序深度（30×-35×）下优于独立的短读或长读管道，将变体调用错误减少至少50%。此外，DNAscope Hybrid在SV和CNV检测方面优于领先的开源工具，并增强了在具有挑战性的基因组区域的变异发现。该管道还通过识别疾病相关基因的变异证明了其临床实用性。此外，DNAscope Hybrid非常高效，单个标准实例的运行时间不到90分钟。结论：DNAscope杂交管道是一个计算效率高，高度准确的变体调用框架，利用了短读和长读测序的优势。通过改进具有挑战性的基因组区域的变异检测，并为临床和大规模基因组应用提供强大的解决方案，它在遗传病诊断、人群规模研究和个性化医疗方面具有重要的前景。

{"title":"A novel and accelerated method for integrated alignment and variant calling from short and long reads.","authors":"Jinnan Hu, Donald Freed, Hanying Feng, Hong Chen, Zhipan Li, Haodong Chen","doi":"10.3389/fbinf.2025.1691056","DOIUrl":"10.3389/fbinf.2025.1691056","url":null,"abstract":"Background: Integrating short- and long-read sequencing technologies has become a promising approach for achieving accurate and comprehensive genomic analysis. Although short-read sequencing (Illumina, etc.) offers high base accuracy and cost efficiency, it struggles with structural variant (SV) detection and complex genomic regions. In contrast, long-read sequencing (PacBio HiFi) excels in resolving large SVs and repetitive sequences but is limited by throughput, higher insertion or deletion (indel) error rates, and sequencing costs. Hybrid approaches may combine these technologies and leverage their complementary strengths and different sources of error to provide higher accuracy, more comprehensive results, and higher throughput by lowering the coverage requirement for the long reads.Methods: This study benchmarks the DNAscope Hybrid (DS-Hybrid) pipeline, a novel integrated alignment and variant calling framework that combines short- and long-read data sequenced from the same sample. The DNAscope Hybrid pipeline is a bioinformatics pipeline that runs on generic x86 CPUs. We evaluate its performance across multiple human genome reference datasets (HG002-HG004) using the draft Q100 and Genome in a Bottle v4.2.1 benchmarks. The pipeline's ability to detect small variants [single-nucleotide polymorphisms (SNPs)/indels)], SVs, and copy-number variations (CNVs) is assessed using data from the Illumina and PacBio sequencing systems at varying read depths (5×-30×). Benchmark results are compared to those of DeepVariant.Results: The DNAscope Hybrid pipeline significantly improves SNP and indel calling accuracy, particularly in complex genomic regions. At lower long-read depths (e.g., 5×-10×), the hybrid approach outperforms stand-alone short- or long-read pipelines at full sequencing depths (30×-35×), reducing variant calling errors by at least 50%. Additionally, the DNAscope Hybrid outperforms leading open-source tools for SV and CNV detection and enhances variant discovery in challenging genomic regions. The pipeline also demonstrates clinical utility by identifying variants in disease-associated genes. Moreover, DNAscope Hybrid is highly efficient, achieving less than 90 min runtimes at single standard instance.Conclusion: The DNAscope Hybrid pipeline is a computationally efficient, highly accurate variant calling framework that leverages the advantages of both short- and long-read sequencing. By improving variant detection in challenging genomic regions and offering a robust solution for clinical and large-scale genomic applications, it holds significant promise for genetic disease diagnostics, population-scale studies, and personalized medicine.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1691056"},"PeriodicalIF":3.9,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146013563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visualizing stability: a sensitivity analysis framework for t-SNE embeddings. 可视化稳定性：t-SNE嵌入的灵敏度分析框架。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2026-01-02 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1719516

Susanne Zabel, Philipp Hennig, Kay Nieselt

t-distributed Stochastic Neighbour Embedding (t-SNE) is a cornerstone for visualizing high-dimensional biological data, where each high-dimensional data point is represented as a point in a two-dimensional map. However, this static map provides no information about the stability of the visual layout, the features that influence it, or the impact of uncertainty in the input data. This work introduces a computational framework that allows one to extend the standard t-SNE plot by visual clues about the stability of the t-SNE embedding. First, we perform a sensitivity analysis to determine feature influence: by combining the Implicit Function Theorem with automatic differentiation, our method computes the sensitivity of the embedding w.r.t. the input data, provided in a Jacobian of first-order derivatives. Heatmap-visualizations of this Jacobian or summarizations thereof reveal which input features are most influential in shaping the embedding and identifying regions of structural instability. Second, when input data uncertainty is available, our framework uses this Jacobian to propagate error, probabilistically quantifying the positional uncertainty of each embedded point. This uncertainty is visualized by augmenting the plot with hypothetical outcomes, which display the positional confidence of each point. We apply our framework to three diverse biological datasets (bulk RNA-seq, proteomics, and single-cell transcriptomics), demonstrating its ability to directly link visual patterns to their underlying biological drivers and reveal ambiguities invisible in a standard plot. By providing this principled means to assess the robustness and interpretability of t-SNE visualizations, our work enables more rigorous and informed scientific conclusions in bioinformatics.

t分布随机邻居嵌入（t-SNE）是可视化高维生物数据的基础，其中每个高维数据点表示为二维地图中的一个点。但是，此静态地图不提供有关视觉布局的稳定性、影响它的特性或输入数据中不确定性的影响的信息。这项工作引入了一个计算框架，允许人们通过关于t-SNE嵌入稳定性的视觉线索扩展标准t-SNE图。首先，我们执行灵敏度分析以确定特征影响：通过将隐函数定理与自动微分相结合，我们的方法计算嵌入w.r.t.输入数据的灵敏度，以一阶导数的雅可比矩阵提供。该雅可比矩阵的热图可视化或其摘要揭示了哪些输入特征在塑造嵌入和识别结构不稳定区域方面最具影响力。其次，当输入数据不确定性可用时，我们的框架使用该雅可比矩阵传播误差，以概率量化每个嵌入点的位置不确定性。这种不确定性是通过用假设结果来增加图来可视化的，这显示了每个点的位置置信度。我们将我们的框架应用于三个不同的生物数据集（大量RNA-seq，蛋白质组学和单细胞转录组学），证明了其将视觉模式与其潜在的生物学驱动因素直接联系起来的能力，并揭示了标准图中不可见的模糊性。通过提供这种原则性的方法来评估t-SNE可视化的稳健性和可解释性，我们的工作使生物信息学中的科学结论更加严格和明智。

{"title":"Visualizing stability: a sensitivity analysis framework for t-SNE embeddings.","authors":"Susanne Zabel, Philipp Hennig, Kay Nieselt","doi":"10.3389/fbinf.2025.1719516","DOIUrl":"10.3389/fbinf.2025.1719516","url":null,"abstract":"t-distributed Stochastic Neighbour Embedding (t-SNE) is a cornerstone for visualizing high-dimensional biological data, where each high-dimensional data point is represented as a point in a two-dimensional map. However, this static map provides no information about the stability of the visual layout, the features that influence it, or the impact of uncertainty in the input data. This work introduces a computational framework that allows one to extend the standard t-SNE plot by visual clues about the stability of the t-SNE embedding. First, we perform a sensitivity analysis to determine feature influence: by combining the Implicit Function Theorem with automatic differentiation, our method computes the sensitivity of the embedding w.r.t. the input data, provided in a Jacobian of first-order derivatives. Heatmap-visualizations of this Jacobian or summarizations thereof reveal which input features are most influential in shaping the embedding and identifying regions of structural instability. Second, when input data uncertainty is available, our framework uses this Jacobian to propagate error, probabilistically quantifying the positional uncertainty of each embedded point. This uncertainty is visualized by augmenting the plot with hypothetical outcomes, which display the positional confidence of each point. We apply our framework to three diverse biological datasets (bulk RNA-seq, proteomics, and single-cell transcriptomics), demonstrating its ability to directly link visual patterns to their underlying biological drivers and reveal ambiguities invisible in a standard plot. By providing this principled means to assess the robustness and interpretability of t-SNE visualizations, our work enables more rigorous and informed scientific conclusions in bioinformatics.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1719516"},"PeriodicalIF":3.9,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12808344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Network-based insights into miR-30a-5p-mediated regulation and EGCG targeting in triple-negative breast cancer. 三阴性乳腺癌中mir -30a-5p介导的调控和EGCG靶向的基于网络的见解

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2025-12-19 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1735106

Loganathan Chandramani Priya Dharshini, Abul Kalam Azad Mandal

Background: Triple-negative breast cancer (TNBC) is defined by the absence of ER, PR, and HER2 expression. This limits the targeted therapies, resulting in poor clinical outcomes. Identifying the molecular targets that can be regulated through miRNAs and natural compounds offers a potential therapeutic platform.

Methods: We combined transcriptomic profiling with miRNA target prediction to identify genes regulated by miR-30a-5p and assess their interaction with the green tea polyphenol, epigallocatechin gallate (EGCG). Differentially expressed genes (DEGs) from TCGA-TNBC datasets and miRNA targets from miRDB, TargetScan, and miRTarBase were screened for common genes. Then, the protein-protein interaction and network topology analyses were performed to identify key hub genes. Molecular docking and simulation were carried out with the four key genes against EGCG.

Results: Data integration yielded 393 overlapping genes and identified ten hub genes- RRM2, KIF11, ANLN, CDC20, CCNA1, AGO2, YWHAZ, DTL, SKP2, and PCNA. Pathway enrichment showed that all these hubs are involved in cell cycle and mitotic regulation, which was associated with poor TNBC prognosis. Mutation profiling revealed high alteration rates in KIF11, ANLN, CDC20, and YWHAZ, with increased missense mutations and C>T transitions. Molecular docking and simulations identified YWHAZ as the most favorable and structurally stable EGCG-binding target, compared to the other three key genes.

Conclusion: The results emphasizes that EGCG has strong binding affinity towards YWHAZ, revealing that miR-30a-EGCG targets TNBC synergistically through cell-cycle-mediated pathways. The findings give rational support for miRNA-guided phytochemical-based TNBC therapeutic development.

背景：三阴性乳腺癌（TNBC）的定义是缺乏ER、PR和HER2的表达。这限制了靶向治疗，导致临床效果不佳。确定可以通过mirna和天然化合物调节的分子靶点提供了一个潜在的治疗平台。方法：我们将转录组学分析与miRNA靶标预测相结合，鉴定miR-30a-5p调控的基因，并评估它们与绿茶多酚表没食子儿茶素没食子酸酯（EGCG）的相互作用。对来自TCGA-TNBC数据集的差异表达基因（DEGs）和来自miRDB、TargetScan和miRTarBase的miRNA靶标进行共同基因筛选。然后，进行蛋白-蛋白相互作用和网络拓扑分析，以确定关键枢纽基因。对EGCG的4个关键基因进行了分子对接和模拟。结果：数据整合得到393个重叠基因，并鉴定出10个枢纽基因——RRM2、KIF11、ANLN、CDC20、CCNA1、AGO2、YWHAZ、DTL、SKP2和PCNA。途径富集表明，所有这些枢纽都参与细胞周期和有丝分裂调节，这与TNBC预后不良有关。突变谱显示，KIF11、ANLN、CDC20和YWHAZ的变化率很高，错义突变和C>T转换增加。分子对接和模拟表明，与其他三个关键基因相比，YWHAZ是最有利且结构稳定的egcg结合靶点。结论：结果强调EGCG对YWHAZ具有较强的结合亲和力，揭示miR-30a-EGCG通过细胞周期介导的途径协同作用于TNBC。这些发现为mirna引导的基于植物化学的TNBC治疗开发提供了合理的支持。

{"title":"Network-based insights into miR-30a-5p-mediated regulation and EGCG targeting in triple-negative breast cancer.","authors":"Loganathan Chandramani Priya Dharshini, Abul Kalam Azad Mandal","doi":"10.3389/fbinf.2025.1735106","DOIUrl":"10.3389/fbinf.2025.1735106","url":null,"abstract":"Background: Triple-negative breast cancer (TNBC) is defined by the absence of ER, PR, and HER2 expression. This limits the targeted therapies, resulting in poor clinical outcomes. Identifying the molecular targets that can be regulated through miRNAs and natural compounds offers a potential therapeutic platform.Methods: We combined transcriptomic profiling with miRNA target prediction to identify genes regulated by miR-30a-5p and assess their interaction with the green tea polyphenol, epigallocatechin gallate (EGCG). Differentially expressed genes (DEGs) from TCGA-TNBC datasets and miRNA targets from miRDB, TargetScan, and miRTarBase were screened for common genes. Then, the protein-protein interaction and network topology analyses were performed to identify key hub genes. Molecular docking and simulation were carried out with the four key genes against EGCG.Results: Data integration yielded 393 overlapping genes and identified ten hub genes- RRM2, KIF11, ANLN, CDC20, CCNA1, AGO2, YWHAZ, DTL, SKP2, and PCNA. Pathway enrichment showed that all these hubs are involved in cell cycle and mitotic regulation, which was associated with poor TNBC prognosis. Mutation profiling revealed high alteration rates in KIF11, ANLN, CDC20, and YWHAZ, with increased missense mutations and C>T transitions. Molecular docking and simulations identified YWHAZ as the most favorable and structurally stable EGCG-binding target, compared to the other three key genes.Conclusion: The results emphasizes that EGCG has strong binding affinity towards YWHAZ, revealing that miR-30a-EGCG targets TNBC synergistically through cell-cycle-mediated pathways. The findings give rational support for miRNA-guided phytochemical-based TNBC therapeutic development.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1735106"},"PeriodicalIF":3.9,"publicationDate":"2025-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12757378/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145901661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pan-cancer analyses identify oncogenic drivers, expression signatures, and therapeutic vulnerabilities in RHO GTPase pathway genes. 泛癌分析确定了RHO GTPase途径基因的致癌驱动因素、表达特征和治疗脆弱性。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2025-12-17 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1708800

Rubén Fernández, L Francisco Lorenzo-Martín, Víctor Quesada, Xosé R Bustelo

RHO family GTPases are key regulators of cancer-related processes such as cytoskeletal dynamics and cell migration, proliferation, and survival. Despite this, a comprehensive understanding of RHO signaling alterations across tumors is still lacking. In this study, we present a pan-cancer analysis of 484 genes encoding RHO GTPases, regulators, proximal effectors, distal downstream signaling elements, and components of their proximal interactomes using data from over 10,000 tumor samples and 33 tumor types present in The Cancer Genome Atlas (TCGA). In addition, we have utilized available data from genome-wide functional dependency screens performed in more than 1,000 gene-edited cancer cell lines. This study has uncovered positively selected mutations in both well-known and previously uncharacterized RHO pathway genes. Transcriptomic profiling reveals widespread and tumor-specific differential expression patterns, with some of them correlating with copy number changes. Interestingly, certain regulators exhibit consistent expression profiles across tumors opposite to those predicted from their canonical roles. Co-expression and gene set enrichment analyses highlight coordinated transcriptional programs involving some RHO GTPase pathway genes and their linkage to key cancer hallmarks, including extracellular matrix reorganization, cell motility, cell cycle progression, cell survival, and immune modulation. Functional screens further identify context-specific dependencies on several deregulated RHO GTPase pathway genes. Altogether, this study provides a comprehensive map of RHO GTPase pathway alterations in cancer and identifies new oncogenic drivers, expression-based signatures, and therapeutic vulnerabilities that could guide future mechanistic and translational research in this area.

RHO家族gtpase是癌症相关过程的关键调控因子，如细胞骨架动力学和细胞迁移、增殖和存活。尽管如此，对肿瘤中RHO信号的改变仍缺乏全面的了解。在这项研究中，我们利用来自癌症基因组图谱（TCGA）中超过10,000个肿瘤样本和33种肿瘤类型的数据，对484个编码RHO gtpase、调节因子、近端效应因子、远端下游信号元件及其近端相互作用组成分的基因进行了泛癌症分析。此外，我们还利用了在1000多个基因编辑的癌细胞系中进行的全基因组功能依赖筛选的可用数据。这项研究发现了众所周知的和以前未表征的RHO通路基因的正选择突变。转录组学分析揭示了广泛和肿瘤特异性的差异表达模式，其中一些与拷贝数变化相关。有趣的是，某些调节因子在肿瘤中表现出一致的表达谱，与它们的规范作用预测相反。共表达和基因集富集分析强调了涉及一些RHO GTPase途径基因的协调转录程序及其与关键癌症标志的联系，包括细胞外基质重组、细胞运动、细胞周期进展、细胞存活和免疫调节。功能筛选进一步确定了对几种不受调控的RHO GTPase通路基因的上下文特异性依赖性。总之，本研究提供了癌症中RHO GTPase通路改变的全面图谱，并确定了新的致癌驱动因素、基于表达的特征和治疗脆弱性，可以指导该领域未来的机制和转化研究。

{"title":"Pan-cancer analyses identify oncogenic drivers, expression signatures, and therapeutic vulnerabilities in RHO GTPase pathway genes.","authors":"Rubén Fernández, L Francisco Lorenzo-Martín, Víctor Quesada, Xosé R Bustelo","doi":"10.3389/fbinf.2025.1708800","DOIUrl":"10.3389/fbinf.2025.1708800","url":null,"abstract":"RHO family GTPases are key regulators of cancer-related processes such as cytoskeletal dynamics and cell migration, proliferation, and survival. Despite this, a comprehensive understanding of RHO signaling alterations across tumors is still lacking. In this study, we present a pan-cancer analysis of 484 genes encoding RHO GTPases, regulators, proximal effectors, distal downstream signaling elements, and components of their proximal interactomes using data from over 10,000 tumor samples and 33 tumor types present in The Cancer Genome Atlas (TCGA). In addition, we have utilized available data from genome-wide functional dependency screens performed in more than 1,000 gene-edited cancer cell lines. This study has uncovered positively selected mutations in both well-known and previously uncharacterized RHO pathway genes. Transcriptomic profiling reveals widespread and tumor-specific differential expression patterns, with some of them correlating with copy number changes. Interestingly, certain regulators exhibit consistent expression profiles across tumors opposite to those predicted from their canonical roles. Co-expression and gene set enrichment analyses highlight coordinated transcriptional programs involving some RHO GTPase pathway genes and their linkage to key cancer hallmarks, including extracellular matrix reorganization, cell motility, cell cycle progression, cell survival, and immune modulation. Functional screens further identify context-specific dependencies on several deregulated RHO GTPase pathway genes. Altogether, this study provides a comprehensive map of RHO GTPase pathway alterations in cancer and identifies new oncogenic drivers, expression-based signatures, and therapeutic vulnerabilities that could guide future mechanistic and translational research in this area.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1708800"},"PeriodicalIF":3.9,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12753894/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification and functional analysis of hub genes in knee osteoarthritis via bioinformatics and experimental validation. 基于生物信息学和实验验证的膝关节骨关节炎中枢基因鉴定和功能分析。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2025-12-17 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1671693

Shanyong Jiang, Jingjing Cao, Jianshu Lu, Jianxiao Liang, Lianxin Li, Yanqiang Song, Jincheng Gao, Baoen Jiang

Objective: Knee osteoarthritis (KOA) is a prevalent chronic degenerative joint disease that causes chronic pain and mobility restrictions in the elderly, significantly impacting quality of life. Current treatments focus on symptom relief, lacking effective interventions targeting the underlying mechanisms. Understanding KOA's molecular mechanisms and identifying key pathogenic genes are essential for developing targeted therapies.

Methods: Gene expression data from KOA patients and healthy controls were obtained from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were identified. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed to reveal the associated biological processes and signaling pathways. Protein-protein interaction (PPI) network analysis and Gene Ontology-based semantic similarity calculations were used to identify hub genes. Gene Set Variation Analysis (GSVA) assessed enrichment in KOA-related pathways. Immune infiltration analysis (CIBERSORT) assessed the immune cell distribution in KOA tissues. Finally, hub gene expression changes were validated using the IL-1β-treated CHON-001 cell model and real-time quantitative PCR (RT-qPCR).

Results: A total of 3,290 upregulated and 2,536 downregulated DEGs were identified. GO and KEGG enrichment analyses revealed these genes were primarily involved in extracellular matrix remodeling, transmembrane transport, and inflammation-related pathways. Key hub genes, including HSPA5, FOXO1, and YWHAE, were identified. GSVA showed that these genes were significantly enriched in multiple KOA-associated signaling pathways. Immune infiltration analysis revealed significant differences in the levels of six immune cell types in KOA tissues, which were associated with the hub genes expression. In CHON-001 cell, the expression levels of GRB2, IKBKG, and HSPA12A were upregulated, whereas YWHAE, HSPB1, and DCAF8 were downregulated, consistent with the tissue samples.

Conclusion: This study identified key pathogenic genes and their regulatory pathways in KOA, highlighting their potential role in disease progression via inflammation and immune modulation. These findings provide insights for developing targeted therapeutic strategies for KOA.

目的：膝关节骨性关节炎（KOA）是一种常见的慢性退行性关节疾病，导致老年人慢性疼痛和活动受限，严重影响生活质量。目前的治疗侧重于症状缓解，缺乏针对潜在机制的有效干预措施。了解KOA的分子机制和确定关键致病基因对开发靶向治疗至关重要。方法：从Gene expression Omnibus （GEO）数据库中获取KOA患者和健康对照者的基因表达数据，鉴定差异表达基因（differential expression genes, DEGs）。基因本体（GO）和京都基因与基因组百科全书（KEGG）富集分析揭示了相关的生物学过程和信号通路。利用蛋白质-蛋白质相互作用（PPI）网络分析和基于基因本体的语义相似度计算来识别中心基因。基因集变异分析（GSVA）评估了koa相关通路的富集程度。免疫浸润分析（CIBERSORT）评估KOA组织中免疫细胞的分布。最后，利用il -1β处理的CHON-001细胞模型和实时定量PCR （RT-qPCR）验证hub基因表达变化。结果：共鉴定出3290个上调的deg和2536个下调的deg。GO和KEGG富集分析显示，这些基因主要参与细胞外基质重塑、跨膜运输和炎症相关途径。鉴定出关键枢纽基因，包括HSPA5、fox01和YWHAE。GSVA显示这些基因在多个koa相关信号通路中显著富集。免疫浸润分析显示，KOA组织中6种免疫细胞类型的水平存在显著差异，这些免疫细胞类型与枢纽基因的表达有关。在CHON-001细胞中，GRB2、IKBKG和HSPA12A的表达水平上调，而YWHAE、HSPB1和DCAF8的表达水平下调，与组织样本一致。结论：本研究确定了KOA的关键致病基因及其调控途径，强调了它们通过炎症和免疫调节在疾病进展中的潜在作用。这些发现为开发针对KOA的靶向治疗策略提供了见解。

{"title":"Identification and functional analysis of hub genes in knee osteoarthritis via bioinformatics and experimental validation.","authors":"Shanyong Jiang, Jingjing Cao, Jianshu Lu, Jianxiao Liang, Lianxin Li, Yanqiang Song, Jincheng Gao, Baoen Jiang","doi":"10.3389/fbinf.2025.1671693","DOIUrl":"10.3389/fbinf.2025.1671693","url":null,"abstract":"Objective: Knee osteoarthritis (KOA) is a prevalent chronic degenerative joint disease that causes chronic pain and mobility restrictions in the elderly, significantly impacting quality of life. Current treatments focus on symptom relief, lacking effective interventions targeting the underlying mechanisms. Understanding KOA's molecular mechanisms and identifying key pathogenic genes are essential for developing targeted therapies.Methods: Gene expression data from KOA patients and healthy controls were obtained from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were identified. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed to reveal the associated biological processes and signaling pathways. Protein-protein interaction (PPI) network analysis and Gene Ontology-based semantic similarity calculations were used to identify hub genes. Gene Set Variation Analysis (GSVA) assessed enrichment in KOA-related pathways. Immune infiltration analysis (CIBERSORT) assessed the immune cell distribution in KOA tissues. Finally, hub gene expression changes were validated using the IL-1β-treated CHON-001 cell model and real-time quantitative PCR (RT-qPCR).Results: A total of 3,290 upregulated and 2,536 downregulated DEGs were identified. GO and KEGG enrichment analyses revealed these genes were primarily involved in extracellular matrix remodeling, transmembrane transport, and inflammation-related pathways. Key hub genes, including HSPA5, FOXO1, and YWHAE, were identified. GSVA showed that these genes were significantly enriched in multiple KOA-associated signaling pathways. Immune infiltration analysis revealed significant differences in the levels of six immune cell types in KOA tissues, which were associated with the hub genes expression. In CHON-001 cell, the expression levels of GRB2, IKBKG, and HSPA12A were upregulated, whereas YWHAE, HSPB1, and DCAF8 were downregulated, consistent with the tissue samples.Conclusion: This study identified key pathogenic genes and their regulatory pathways in KOA, highlighting their potential role in disease progression via inflammation and immune modulation. These findings provide insights for developing targeted therapeutic strategies for KOA.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1671693"},"PeriodicalIF":3.9,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12753952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145890560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bioengineering hybrid artificial life. 生物工程混合人工生命。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2025-12-16 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1676359

Innocent Sibanda, Geoff Nitschke

The goal of bioengineering in synthetic biology is to redesign, reprogram, and rewire biological systems for specific applications using standardized parts such as promoters and ribosomes. For example, bioengineered micro-organisms capable of cleaning up environmental pollution or producing antibodies de novo to defend against viral pandemics have been predicted. Artificial Life (ALife) facilitates the design and understanding of living systems, not just those found in nature, but life as it could be, while synthetic biology provides the means to realize life as it can be engineered. Despite significant advances, the synthesis of evolving, adaptable, and bioengineered problem-solving ALife has yet to achieve practical feasibility. This is primarily due to limitations in directed evolution, fitness landscape mapping, and fitness approximation. Thus, currently synthetic (biological) ALife does not continue to evolve and adapt to changing tasks and environments. This is in stark contrast to current digital based ALife that continues to adapt and evolve in simulated environments demonstrating the dictum of life as it could be. We posit that if the bioengineering (on-demand design) of problem solving ALife is to ever become a reality then open issues pervading the directed evolution of synthetic ALife must first be addressed. This review examines open challenges in directed evolution, genetic diversity generation, fitness mapping, and fitness estimation, and outlines future directions toward a hybrid synthetic ALife design methodology. This review provides a novel perspective for a singular (hybridized) evolutionary design methodology, combining digital (in silico) and synthetic (in vitro) evolutionary design methods drawn from various bioengineering, digital and robotic ALife applications, while addressing highlighted directed evolution deficiencies.

合成生物学中生物工程的目标是利用启动子和核糖体等标准化部件为特定应用重新设计、重新编程和重新连接生物系统。例如，有人预测生物工程微生物能够清除环境污染或产生新的抗体来抵御病毒大流行。人工生命（ALife）促进了对生命系统的设计和理解，不仅仅是那些在自然界中发现的生命，还有生命的可能，而合成生物学提供了实现生命的手段，因为它可以被设计。尽管取得了重大进展，但进化、适应性和生物工程解决问题的生命的合成尚未实现实际可行性。这主要是由于定向进化、适应度景观映射和适应度近似的局限性。因此，目前合成（生物）生命不能继续进化和适应不断变化的任务和环境。这与当前基于数字的生命形成鲜明对比，后者在模拟环境中不断适应和进化，证明了生命的格言。我们认为，如果解决人工生命问题的生物工程（按需设计）要成为现实，那么必须首先解决人工生命定向进化中普遍存在的开放性问题。本文综述了定向进化、遗传多样性产生、适应度映射和适应度估计方面的开放性挑战，并概述了混合合成生命设计方法的未来方向。这篇综述为单一（杂交）进化设计方法提供了一个新的视角，结合了从各种生物工程、数字和机器人生命应用中提取的数字（计算机）和合成（体外）进化设计方法，同时解决了突出的定向进化缺陷。

{"title":"Bioengineering hybrid artificial life.","authors":"Innocent Sibanda, Geoff Nitschke","doi":"10.3389/fbinf.2025.1676359","DOIUrl":"10.3389/fbinf.2025.1676359","url":null,"abstract":"The goal of bioengineering in synthetic biology is to redesign, reprogram, and rewire biological systems for specific applications using standardized parts such as promoters and ribosomes. For example, bioengineered micro-organisms capable of cleaning up environmental pollution or producing antibodies de novo to defend against viral pandemics have been predicted. Artificial Life (ALife) facilitates the design and understanding of living systems, not just those found in nature, but life as it could be, while synthetic biology provides the means to realize life as it can be engineered. Despite significant advances, the synthesis of evolving, adaptable, and bioengineered problem-solving ALife has yet to achieve practical feasibility. This is primarily due to limitations in directed evolution, fitness landscape mapping, and fitness approximation. Thus, currently synthetic (biological) ALife does not continue to evolve and adapt to changing tasks and environments. This is in stark contrast to current digital based ALife that continues to adapt and evolve in simulated environments demonstrating the dictum of life as it could be. We posit that if the bioengineering (on-demand design) of problem solving ALife is to ever become a reality then open issues pervading the directed evolution of synthetic ALife must first be addressed. This review examines open challenges in directed evolution, genetic diversity generation, fitness mapping, and fitness estimation, and outlines future directions toward a hybrid synthetic ALife design methodology. This review provides a novel perspective for a singular (hybridized) evolutionary design methodology, combining digital (in silico) and synthetic (in vitro) evolutionary design methods drawn from various bioengineering, digital and robotic ALife applications, while addressing highlighted directed evolution deficiencies.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1676359"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12748196/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

In silico identification of novel natural compounds as potential KIFC1 inhibitors for the therapeutic intervention of triple-negative breast cancer. 新型天然化合物作为三阴性乳腺癌治疗干预的潜在KIFC1抑制剂的计算机鉴定。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2025-12-16 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1689172

Prashant Kumar Tiwari, Mukesh Kumar, Richa Mishra, Xiaomeng Zhang, Sanjay Kumar

TNBC is an aggressive and various subtype of breast cancer, notable by the lack of specific oestrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2), consequential in limited treatment options and poor prognosis. Kinesin Family Member C1 (KIFC1), a mitotic motor protein critical for centrosome clustering and spindle formation, has critical role in TNBC progress. In this situation, natural compounds were explored as probable inhibitors of this protein. we utilized molecular docking, ADMET profiling, density functional theory calculations, molecular dynamics simulations, MM/GBSA binding free energy analysis, and principal component analysis to thoroughly evaluate binding affinity, stability, and drug-likeness property of natural compounds against KIFC1. Of the 36,900 compounds utilized, five natural compounds were carefully chosen for further assessment. All five compounds Fosfocytocin, Molybdopterin Compound Z, 5-amino-2-(3-hydroxy-13-methyltetradecanamido) pentanoic acid, TMC-52A, and Muscimol exhibited significant inhibitory efficacy against KIFC1. These compounds demonstrated persistent interactions with critical residues and had advantageous binding properties in computational evaluations. The results collectively indicate their potential as effective inhibitors for targeting KIFC1 in forthcoming studies. These data collectively identify all five natural compounds as possible inhibitors of KIFC1. Nonetheless, their effectiveness and safety must be confirmed through in vivo and in vitro study prior to consideration for clinical application.

TNBC是一种侵袭性的多种亚型乳腺癌，其特点是缺乏特异性雌激素受体（ER）、孕激素受体（PR）和人表皮生长因子受体2 (HER2)，因此治疗选择有限，预后差。Kinesin家族成员C1 （KIFC1）是一种对中心体聚集和纺锤体形成至关重要的有丝分裂运动蛋白，在TNBC的进展中起着关键作用。在这种情况下，天然化合物被探索作为这种蛋白质的可能抑制剂。我们利用分子对接、ADMET分析、密度泛函理论计算、分子动力学模拟、MM/GBSA结合自由能分析和主成分分析来全面评估天然化合物对KIFC1的结合亲和力、稳定性和药物相似性。在使用的36900种化合物中，精心选择了5种天然化合物进行进一步评估。磷霉素、钼钼素化合物Z、5-氨基-2-（3-羟基-13-甲基十四烷酰胺）戊酸、TMC-52A和Muscimol对KIFC1均有显著的抑制作用。这些化合物表现出与关键残基的持续相互作用，并在计算评估中具有有利的结合特性。这些结果共同表明，在未来的研究中，它们有可能成为靶向KIFC1的有效抑制剂。这些数据共同确定了所有五种天然化合物可能是KIFC1的抑制剂。然而，在考虑临床应用之前，它们的有效性和安全性必须通过体内和体外研究来证实。

{"title":"In silico identification of novel natural compounds as potential KIFC1 inhibitors for the therapeutic intervention of triple-negative breast cancer.","authors":"Prashant Kumar Tiwari, Mukesh Kumar, Richa Mishra, Xiaomeng Zhang, Sanjay Kumar","doi":"10.3389/fbinf.2025.1689172","DOIUrl":"10.3389/fbinf.2025.1689172","url":null,"abstract":"TNBC is an aggressive and various subtype of breast cancer, notable by the lack of specific oestrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2), consequential in limited treatment options and poor prognosis. Kinesin Family Member C1 (KIFC1), a mitotic motor protein critical for centrosome clustering and spindle formation, has critical role in TNBC progress. In this situation, natural compounds were explored as probable inhibitors of this protein. we utilized molecular docking, ADMET profiling, density functional theory calculations, molecular dynamics simulations, MM/GBSA binding free energy analysis, and principal component analysis to thoroughly evaluate binding affinity, stability, and drug-likeness property of natural compounds against KIFC1. Of the 36,900 compounds utilized, five natural compounds were carefully chosen for further assessment. All five compounds Fosfocytocin, Molybdopterin Compound Z, 5-amino-2-(3-hydroxy-13-methyltetradecanamido) pentanoic acid, TMC-52A, and Muscimol exhibited significant inhibitory efficacy against KIFC1. These compounds demonstrated persistent interactions with critical residues and had advantageous binding properties in computational evaluations. The results collectively indicate their potential as effective inhibitors for targeting KIFC1 in forthcoming studies. These data collectively identify all five natural compounds as possible inhibitors of KIFC1. Nonetheless, their effectiveness and safety must be confirmed through in vivo and in vitro study prior to consideration for clinical application.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1689172"},"PeriodicalIF":3.9,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12748000/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Celline: a flexible tool for one-step retrieval and integrative analysis of public single-cell RNA sequencing data. Celline：一个灵活的工具，用于一步检索和综合分析公共单细胞RNA测序数据。

IF 3.9 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in bioinformatics

Pub Date : 2025-12-11 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1684227

Yuya Sato, Toru Asahi, Kosuke Kataoka

Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework's practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.

单细胞RNA测序（scRNA-seq）产生了一个快速扩展的公共数据集，提供了对发展、疾病和治疗的深入了解。然而，研究人员缺乏无缝检索、预处理、集成和分析这些数据的端到端解决方案，因为现有的工具只能处理孤立的步骤，并且需要手动管理接入、元数据和技术可变性，即批处理效应。在本研究中，我们开发了Celline，这是一个Python包，每一步使用单行命令执行整个工作流。Celline自动收集来自多个公共存储库的原始单细胞RNA-seq数据，并使用大型语言模型提取元数据。然后，它将已建立的工具，包括用于双线去除的scrulet，用于质量控制和细胞类型注释的Seurat和Scanpy，用于批量校正的Harmony和scVI，以及用于轨迹推断的Slingshot，打包成一行命令，从而实现无缝集成分析。为了验证celline获取的数据质量和集成框架的实用性，我们将其应用于2只小鼠胚胎14.5天和18天的大脑皮层数据集。技术验证表明，Celline成功地检索了数据，标准化了元数据，并启用了标准分析，删除了低质量细胞，注释了11种主要细胞类型，提高了集成质量（scIB评分+0.22），并完成了轨迹分析。因此，Celline将分散的公共scRNA-seq资源转化为统一的、可用于分析的数据集。其模块化设计允许管道扩展，鼓励社区驱动的进步，并加速单细胞数据的发现。

{"title":"Celline: a flexible tool for one-step retrieval and integrative analysis of public single-cell RNA sequencing data.","authors":"Yuya Sato, Toru Asahi, Kosuke Kataoka","doi":"10.3389/fbinf.2025.1684227","DOIUrl":"10.3389/fbinf.2025.1684227","url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) has generated a rapidly expanding collection of public datasets that provide insight into development, disease, and therapy. However, researchers lack an end-to-end solution for seamlessly retrieving, preprocessing, integrating, and analyzing these data because existing tools address only isolated steps and require manual curation of accessions, metadata, and technical variability, known as batch effects. In this study, we developed Celline, a Python package that executes an entire workflow using a single-line commands per step. Celline automatically gathers raw single-cell RNA-seq data from multiple public repositories and extracts metadata using large language models. It then wraps established tools, including Scrublet for doublet removal, Seurat and Scanpy for quality control and cell-type annotation, Harmony and scVI for batch correction, and Slingshot for trajectory inference, into one-line commands, enabling seamless integrative analyses. To validate Celline-acquired data quality and the integrated framework's practical utility, we applied it to 2 mouse brain cortex datasets from embryonic days 14.5 and 18. Technical validation demonstrated that Celline successfully retrieved data, standardized metadata, and enabled standard analyses that removed low-quality cells, annotated 11 major cell types, improved integration quality (scIB score +0.22), and completed trajectory analysis. Thus, Celline transforms scattered public scRNA-seq resources into unified, analysis-ready datasets with minimal effort. Its modular design allows pipeline extension, encourages community-driven advances, and accelerates the discovery of single-cell data.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1684227"},"PeriodicalIF":3.9,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12738925/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145851856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0