Briefings in bioinformatics最新文献_第3页

Ab initio detection of multiple epitranscriptomic modifications from Oxford nanopore technology direct RNA sequencing data. 从牛津纳米孔技术直接RNA测序数据中从头开始检测多个表转录组修饰。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf709

Adriano Fonzino, Bruno Fosso, Grazia Visci, Carmela Gissi, Graziano Pesole, Ernesto Picardi

Charting the eukaryotic epitranscriptome by direct RNA sequencing is promising but still very challenging, as current bioinformatics tools are based on modification-unaware software and require multiple modification-specific learning steps. Here, we introduce NanoSpeech, a modification-aware basecaller for the ab initio simultaneous detection of multiple modified bases using a transformer model, and NanoListener, which implements a simulated randomers strategy for robust training datasets and a new generation of ONT basecallers. NanoListener and NanoSpeech are independent of the specific ONT chemistry. Once a training dataset has been created, a single model with an expanded vocabulary can accurately basecall both unmodified and modified bases.

通过直接RNA测序绘制真核细胞表转录组是有希望的，但仍然非常具有挑战性，因为目前的生物信息学工具是基于不知道修饰的软件，需要多个修饰特异性的学习步骤。在这里，我们介绍了NanoSpeech，一个修改感知基调用器，用于使用变压器模型从头开始同时检测多个修改基，以及NanoListener，它实现了鲁棒训练数据集和新一代ONT基调用器的模拟随机策略。纳米听者和纳米语音是独立于特定的ONT化学。一旦创建了训练数据集，具有扩展词汇表的单个模型就可以准确地调用未修改和修改的基。

引用次数: 0

CSRefiner: a lightweight framework for fine-tuning cell segmentation models with small datasets. CSRefiner：一个轻量级框架，用于对小数据集的单元格分割模型进行微调。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf718

Can Shi, Yumei Li, Jing Guo, Qiuling Chen, Tingting Cao, Sha Liao, Ao Chen, Mei Li, Ying Zhang

Recent advances in spatial omics technologies have enabled transcriptome profiling at subcellular resolution. By performing cell segmentation on nuclear or membrane staining images, researchers can acquire single cell level spatial gene expression data, which in turn enables subsequent biological interpretation. Although deep learning-based segmentation models achieve high overall accuracy, their performance remains suboptimal for whole-tissue analysis, particularly in ensuring consistent segmentation accuracy across diverse cell populations. Existing fine-tuning approaches often require extensive retraining or are tailored to specific model architectures, limiting their adaptability and scalability in practical settings. To address these challenges, we present CSRefiner, a lightweight and efficient fine-tuning framework for precise whole-tissue single-cell spatial expression analysis. Our approach incorporates support for fine-tuning widely used segmentation models in the field of spatial omics, while achieving high accuracy with very limited annotated data. This study demonstrates CSRefiner's superior performance across various staining types and its compatibility with multiple mainstream models. Combining operational simplicity with robust accuracy, our framework offers a practical solution for real-world spatial transcriptomics applications.

空间组学技术的最新进展使亚细胞分辨率的转录组分析成为可能。通过对核或膜染色图像进行细胞分割，研究人员可以获得单细胞水平的空间基因表达数据，从而实现后续的生物学解释。尽管基于深度学习的分割模型实现了很高的整体精度，但对于整个组织分析来说，它们的性能仍然不是最佳的，特别是在确保不同细胞群的一致分割精度方面。现有的微调方法通常需要大量的再培训，或者针对特定的模型架构进行定制，从而限制了它们在实际环境中的适应性和可伸缩性。为了解决这些挑战，我们提出了CSRefiner，一种轻量级和高效的微调框架，用于精确的全组织单细胞空间表达分析。我们的方法结合了对空间组学领域中广泛使用的分割模型的微调支持，同时在非常有限的注释数据下实现了高精度。本研究证明了CSRefiner在各种染色类型上的优越性能及其与多种主流模型的兼容性。结合操作简单性和强大的准确性，我们的框架为现实世界的空间转录组学应用提供了一个实用的解决方案。

{"title":"CSRefiner: a lightweight framework for fine-tuning cell segmentation models with small datasets.","authors":"Can Shi, Yumei Li, Jing Guo, Qiuling Chen, Tingting Cao, Sha Liao, Ao Chen, Mei Li, Ying Zhang","doi":"10.1093/bib/bbaf718","DOIUrl":"10.1093/bib/bbaf718","url":null,"abstract":"Recent advances in spatial omics technologies have enabled transcriptome profiling at subcellular resolution. By performing cell segmentation on nuclear or membrane staining images, researchers can acquire single cell level spatial gene expression data, which in turn enables subsequent biological interpretation. Although deep learning-based segmentation models achieve high overall accuracy, their performance remains suboptimal for whole-tissue analysis, particularly in ensuring consistent segmentation accuracy across diverse cell populations. Existing fine-tuning approaches often require extensive retraining or are tailored to specific model architectures, limiting their adaptability and scalability in practical settings. To address these challenges, we present CSRefiner, a lightweight and efficient fine-tuning framework for precise whole-tissue single-cell spatial expression analysis. Our approach incorporates support for fine-tuning widely used segmentation models in the field of spatial omics, while achieving high accuracy with very limited annotated data. This study demonstrates CSRefiner's superior performance across various staining types and its compatibility with multiple mainstream models. Combining operational simplicity with robust accuracy, our framework offers a practical solution for real-world spatial transcriptomics applications.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12796817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EpiXFormer: a cross-attention neural network for predicting cell type-specific transcription factor binding sites. EpiXFormer：用于预测细胞类型特异性转录因子结合位点的交叉关注神经网络。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf721

Yonglin Peng, Xinhua Liu, Jun Wu, Sang Lin, Shengxuan Zhan, Hua Li, Ju Wang, Xiaodong Zhao

Transcription factors (TFs) bind to specific sequences in the genome to regulate gene expression and specify cell states. TF binding sites (TFBSs) are cell type-specific, which can be attributed to epigenomic contexts. Comprehensive profiling of TFBSs across various cell types through experimental approaches is neither practical nor cost-friendly. Accurately identifying cell type-specific TFBSs through computational approaches remains challenging. Here, we develop EpiXFormer, a novel transformer-based neural network for cell type-specific TFBS prediction. EpiXFormer achieves exceptional performance in predicting binding sites of DNA-binding proteins (DBPs) across a diverse collection of cell types. It models the effects of proximal and distal epigenomic information on DBP binding and learns the identified motifs of the examined TFs and their potential co-occurring proteins. Moreover, we demonstrate that EpiXFormer can infer pioneer factors during cell type transition and delineate the cell type-specific regulatory functions of TFs. Overall, EpiXFormer enables cell type-specific TFBS prediction in the examined cell lines and is readily applied to other cell types of interest. It provides a robust, scalable framework for characterizing and interpreting multimodal genomic data.

转录因子（TFs）与基因组中的特定序列结合，以调节基因表达和指定细胞状态。TF结合位点（TFBSs）是细胞类型特异性的，这可以归因于表观基因组背景。通过实验方法对不同细胞类型的TFBSs进行全面分析既不实用也不划算。通过计算方法准确识别细胞类型特异性TFBSs仍然具有挑战性。在这里，我们开发了EpiXFormer，这是一种基于变压器的新型神经网络，用于细胞类型特异性TFBS预测。EpiXFormer在预测不同细胞类型的dna结合蛋白（DBPs）结合位点方面取得了卓越的表现。它模拟了近端和远端表观基因组信息对DBP结合的影响，并了解了所检测的tf及其潜在共发生蛋白的已识别基元。此外，我们证明EpiXFormer可以推断细胞类型转换过程中的先锋因子，并描述tf的细胞类型特异性调节功能。总的来说，EpiXFormer能够在检测的细胞系中实现细胞类型特异性的TFBS预测，并且很容易应用于其他感兴趣的细胞类型。它为描述和解释多模态基因组数据提供了一个强大的、可扩展的框架。

{"title":"EpiXFormer: a cross-attention neural network for predicting cell type-specific transcription factor binding sites.","authors":"Yonglin Peng, Xinhua Liu, Jun Wu, Sang Lin, Shengxuan Zhan, Hua Li, Ju Wang, Xiaodong Zhao","doi":"10.1093/bib/bbaf721","DOIUrl":"10.1093/bib/bbaf721","url":null,"abstract":"Transcription factors (TFs) bind to specific sequences in the genome to regulate gene expression and specify cell states. TF binding sites (TFBSs) are cell type-specific, which can be attributed to epigenomic contexts. Comprehensive profiling of TFBSs across various cell types through experimental approaches is neither practical nor cost-friendly. Accurately identifying cell type-specific TFBSs through computational approaches remains challenging. Here, we develop EpiXFormer, a novel transformer-based neural network for cell type-specific TFBS prediction. EpiXFormer achieves exceptional performance in predicting binding sites of DNA-binding proteins (DBPs) across a diverse collection of cell types. It models the effects of proximal and distal epigenomic information on DBP binding and learns the identified motifs of the examined TFs and their potential co-occurring proteins. Moreover, we demonstrate that EpiXFormer can infer pioneer factors during cell type transition and delineate the cell type-specific regulatory functions of TFs. Overall, EpiXFormer enables cell type-specific TFBS prediction in the examined cell lines and is readily applied to other cell types of interest. It provides a robust, scalable framework for characterizing and interpreting multimodal genomic data.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12796812/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering single-cell multi-omics data via multi-subspace contrastive learning with structural smoothness. 基于结构平滑的多子空间对比学习的单细胞多组学数据聚类。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag005

Yun Ding, Yangzhen Jiang, Jing Wang, Dayu Tan, Yansen Su, Chunhou Zheng

The integration of single-cell multi-omics data can uncover the underlying regulatory basis of diverse cell types and states. However, single-cell data inherently suffer from high levels of noise, sparsity, and intercellular heterogeneity, which pose significant challenges to the accuracy and robustness of clustering algorithms. Most existing multi-omics clustering approaches primarily focus on the integration of omics individuality and commonality across modalities, but they ignore the diverse feature extraction of the low-dimensional representation before the fusion of single-cell multi-omics data, and the feature smoothing consistency of the diverse features after the fusion of single-cell multi-omics data. In order to address above issues, we propose a novel multi-subspace contrastive learning with structural smoothness method for single-cell multi-omics data clustering (scMUSCLE), which is designed to address the challenges inherent in multi-omics data integration. First, the proposed scMUSCLE method leverages the degree structure to enhance structural diversity of each omics modality. Second, we perform multi-subspace contrastive learning to improve the diversity exploration across multi-omics features. Next, we propose an adaptive graph convolution clustering module, which establishes an adaptive feedback mechanism between intra-cluster smoothness and the downstream clustering task. Extensive experiments on four benchmark multi-omics datasets demonstrate the effectiveness and robustness. The source code can be found on the GitHub repository: https://github.com/GodIsGad/scMUSCLE.

单细胞多组学数据的整合可以揭示不同细胞类型和状态的潜在调控基础。然而，单细胞数据固有地受到高水平的噪声、稀疏性和细胞间异质性的影响，这对聚类算法的准确性和鲁棒性构成了重大挑战。现有的多组学聚类方法大多侧重于跨模态的组学个性与共性的融合，而忽略了单细胞多组学数据融合前低维表示的多样性特征提取，以及单细胞多组学数据融合后多样性特征的平滑一致性。为了解决上述问题，我们提出了一种新的基于结构平滑的多子空间对比学习单细胞多组学数据聚类方法（scMUSCLE），该方法旨在解决多组学数据集成中固有的挑战。首先，本文提出的scMUSCLE方法利用度结构增强各组学模态的结构多样性。其次，我们进行多子空间对比学习，提高跨多组学特征的多样性探索。接下来，我们提出了一个自适应图卷积聚类模块，该模块建立了簇内平滑度与下游聚类任务之间的自适应反馈机制。在四个基准多组学数据集上的大量实验证明了该方法的有效性和鲁棒性。源代码可以在GitHub存储库中找到：https://github.com/GodIsGad/scMUSCLE。

{"title":"Clustering single-cell multi-omics data via multi-subspace contrastive learning with structural smoothness.","authors":"Yun Ding, Yangzhen Jiang, Jing Wang, Dayu Tan, Yansen Su, Chunhou Zheng","doi":"10.1093/bib/bbag005","DOIUrl":"10.1093/bib/bbag005","url":null,"abstract":"The integration of single-cell multi-omics data can uncover the underlying regulatory basis of diverse cell types and states. However, single-cell data inherently suffer from high levels of noise, sparsity, and intercellular heterogeneity, which pose significant challenges to the accuracy and robustness of clustering algorithms. Most existing multi-omics clustering approaches primarily focus on the integration of omics individuality and commonality across modalities, but they ignore the diverse feature extraction of the low-dimensional representation before the fusion of single-cell multi-omics data, and the feature smoothing consistency of the diverse features after the fusion of single-cell multi-omics data. In order to address above issues, we propose a novel multi-subspace contrastive learning with structural smoothness method for single-cell multi-omics data clustering (scMUSCLE), which is designed to address the challenges inherent in multi-omics data integration. First, the proposed scMUSCLE method leverages the degree structure to enhance structural diversity of each omics modality. Second, we perform multi-subspace contrastive learning to improve the diversity exploration across multi-omics features. Next, we propose an adaptive graph convolution clustering module, which establishes an adaptive feedback mechanism between intra-cluster smoothness and the downstream clustering task. Extensive experiments on four benchmark multi-omics datasets demonstrate the effectiveness and robustness. The source code can be found on the GitHub repository: https://github.com/GodIsGad/scMUSCLE.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834668/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146050305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advances in scCUT&Tag and computational analysis for single-cell gene regulatory element mapping. 单细胞基因调控元件定位scCUT&Tag及计算分析研究进展。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag015

Jun Wu, Md Wahiduzzaman, Pengfei Yin, Puxuan Sun, Haoping Chen, Yongwen Ding, Jiankang Wang

Histone modifications (HMs) and transcription factors (TFs) are central to chromatin dynamics and transcriptional regulation. Conventional bulk approaches like ChIP-seq require large cell populations, limiting applicability to heterogeneous studies and tissue samples. In contrast, single-cell cleavage under targets and tagmentation (scCUT&Tag) and its variants have enabled high-resolution profiling of HMs and TFs for investigating gene regulatory mechanisms in individual cells, transformatively broadening single-cell epigenomics beyond chromatin accessibility measured by scATAC-seq. Despite rapid advances in scCUT&Tag-related methods and the accumulation of ~21 public datasets, a systematic overview of the current research status, especially the forefront of computational analysis and ensuing challenges, remains lacking. Here, we comprehensively overview current scCUT&Tag studies from a bioinformatics perspective. We catalog representative applications spanning diverse chromatin features, experimental designs, and data characteristics. We delineate a typical computational workflow from matrix generation to downstream functional annotations, emphasizing distinctions from scATAC-seq analysis, and highlighting critical analytical considerations. We extensively survey commonly used computational tools and key algorithms, compare analytical features between scCUT&Tag and scATAC-seq, and discuss major challenges in integrative analysis. This work provides a structured reference for understanding the current research landscape of scCUT&Tag and offers computational perspectives for researchers aiming to explore gene regulatory machinery at single-cell resolution.

组蛋白修饰（HMs）和转录因子（tf）是染色质动力学和转录调控的核心。像ChIP-seq这样的常规批量方法需要大量细胞群，限制了异质研究和组织样本的适用性。相比之下，单细胞在靶标和标记下的切割（scCUT&Tag）及其变体使得高分辨率的HMs和tf谱能够用于研究单个细胞中的基因调控机制，转化性地拓宽了单细胞表观基因组学，超越了sctac -seq测量的染色质可及性。尽管sccut&tag相关方法取得了快速进展，并积累了约21个公共数据集，但对当前研究现状，特别是计算分析的前沿和随之而来的挑战，仍然缺乏系统的概述。本文从生物信息学的角度对scCUT&Tag的研究现状进行了综述。我们编录了具有代表性的应用程序，涵盖不同的染色质特征，实验设计和数据特征。我们描述了从矩阵生成到下游功能注释的典型计算工作流，强调了与scATAC-seq分析的区别，并强调了关键的分析注意事项。我们广泛地调查了常用的计算工具和关键算法，比较了scCUT&Tag和scATAC-seq之间的分析特征，并讨论了集成分析中的主要挑战。这项工作为理解scCUT&Tag的当前研究前景提供了结构化的参考，并为旨在探索单细胞分辨率的基因调控机制的研究人员提供了计算视角。

{"title":"Advances in scCUT&Tag and computational analysis for single-cell gene regulatory element mapping.","authors":"Jun Wu, Md Wahiduzzaman, Pengfei Yin, Puxuan Sun, Haoping Chen, Yongwen Ding, Jiankang Wang","doi":"10.1093/bib/bbag015","DOIUrl":"10.1093/bib/bbag015","url":null,"abstract":"Histone modifications (HMs) and transcription factors (TFs) are central to chromatin dynamics and transcriptional regulation. Conventional bulk approaches like ChIP-seq require large cell populations, limiting applicability to heterogeneous studies and tissue samples. In contrast, single-cell cleavage under targets and tagmentation (scCUT&Tag) and its variants have enabled high-resolution profiling of HMs and TFs for investigating gene regulatory mechanisms in individual cells, transformatively broadening single-cell epigenomics beyond chromatin accessibility measured by scATAC-seq. Despite rapid advances in scCUT&Tag-related methods and the accumulation of ~21 public datasets, a systematic overview of the current research status, especially the forefront of computational analysis and ensuing challenges, remains lacking. Here, we comprehensively overview current scCUT&Tag studies from a bioinformatics perspective. We catalog representative applications spanning diverse chromatin features, experimental designs, and data characteristics. We delineate a typical computational workflow from matrix generation to downstream functional annotations, emphasizing distinctions from scATAC-seq analysis, and highlighting critical analytical considerations. We extensively survey commonly used computational tools and key algorithms, compare analytical features between scCUT&Tag and scATAC-seq, and discuss major challenges in integrative analysis. This work provides a structured reference for understanding the current research landscape of scCUT&Tag and offers computational perspectives for researchers aiming to explore gene regulatory machinery at single-cell resolution.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853305/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MoAGNN: a multi-omics hierarchical graph neural network for subtype classification and prognosis prediction in lung adenocarcinoma. MoAGNN：用于肺腺癌亚型分类和预后预测的多组学分层图神经网络。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf735

Cheng-Pei Lin, Yann-Jen Ho, Yen-Peng Chiu, Yun Tang, You Sheng Paik, Guan-Ting Chen, Wei-Chih Huang, Tzong-Yi Lee

Lung adenocarcinoma (LUAD), the most common subtype of nonsmall cell lung cancer, exhibits substantial molecular heterogeneity, complicating subtype classification, progression assessment, and treatment decision-making. Advances in high-throughput sequencing enable multi-omics analysis to reveal cancer mechanisms and biomarkers, yet the high dimensionality, heterogeneity, and interrelationships of omics layers such as transcriptome, microRNA expression, methylome, and copy number variation remain challenging to integrate through conventional methods. Most existing graph-based approaches represent patients as nodes, obscuring gene-level regulatory dynamics and limiting biological interpretability. To address this, we propose the Multi-omics Hierarchical Graph Neural Network (MoAGNN), a novel architecture that represents genes as nodes, integrates four omics, and leverages graph convolution with self-attention-based graph pooling to identify informative molecular nodes, thereby enhancing predictive performance and interpretability for LUAD subtype classification, tumor staging, and prognosis prediction. Multi-omics datasets from The Cancer Genome Atlas (TCGA) were used and results showed that MoAGNN achieved a test accuracy of 0.89 for LUAD subtype classification, outperforming conventional models (Random Forest, Support Vector Machine and Multi-Layer Perceptron) as well as state-of-the-art graph-based models MoGCN, a multi-omics integration model based on graph convolutional network, and MOGLAM, an end-to-end interpretable multi-omics integration method. Furthermore, we validated the generalizability of this framework on the GSE81089 dataset, demonstrating its potential applicability to clinically relevant risk assessment. Subsequent functional enrichment and survival analyses validated the biological relevance of the key genes identified by MoAGNN, supporting their potential roles in LUAD progression, and suggesting the broader applicability of this framework in multi-omics cancer research.

肺腺癌（LUAD）是非小细胞肺癌中最常见的亚型，其表现出明显的分子异质性，使亚型分类、进展评估和治疗决策复杂化。高通量测序的进步使多组学分析能够揭示癌症机制和生物标志物，但转录组、microRNA表达、甲基化组和拷贝数变异等组学层的高维性、异质性和相互关系仍然难以通过传统方法整合。大多数现有的基于图的方法将患者表示为节点，模糊了基因水平的调控动力学并限制了生物学的可解释性。为了解决这个问题，我们提出了多组学分层图神经网络（MoAGNN），这是一种将基因表示为节点的新架构，集成了四个组学，并利用图卷积和基于自注意的图池来识别信息丰富的分子节点，从而提高了LUAD亚型分类、肿瘤分期和预后预测的预测性能和可解释性。使用来自癌症基因组图谱（TCGA）的多组学数据集，结果表明MoAGNN对LUAD亚型分类的测试准确率为0.89，优于传统模型（随机森林、支持向量机和多层感知器）以及基于图卷积网络的多组学集成模型MoGCN和端到端可解释的多组学集成方法MOGLAM。此外，我们在GSE81089数据集上验证了该框架的泛化性，证明了其在临床相关风险评估中的潜在适用性。随后的功能富集和生存分析验证了MoAGNN鉴定的关键基因的生物学相关性，支持它们在LUAD进展中的潜在作用，并表明该框架在多组学癌症研究中具有更广泛的适用性。

{"title":"MoAGNN: a multi-omics hierarchical graph neural network for subtype classification and prognosis prediction in lung adenocarcinoma.","authors":"Cheng-Pei Lin, Yann-Jen Ho, Yen-Peng Chiu, Yun Tang, You Sheng Paik, Guan-Ting Chen, Wei-Chih Huang, Tzong-Yi Lee","doi":"10.1093/bib/bbaf735","DOIUrl":"10.1093/bib/bbaf735","url":null,"abstract":"Lung adenocarcinoma (LUAD), the most common subtype of nonsmall cell lung cancer, exhibits substantial molecular heterogeneity, complicating subtype classification, progression assessment, and treatment decision-making. Advances in high-throughput sequencing enable multi-omics analysis to reveal cancer mechanisms and biomarkers, yet the high dimensionality, heterogeneity, and interrelationships of omics layers such as transcriptome, microRNA expression, methylome, and copy number variation remain challenging to integrate through conventional methods. Most existing graph-based approaches represent patients as nodes, obscuring gene-level regulatory dynamics and limiting biological interpretability. To address this, we propose the Multi-omics Hierarchical Graph Neural Network (MoAGNN), a novel architecture that represents genes as nodes, integrates four omics, and leverages graph convolution with self-attention-based graph pooling to identify informative molecular nodes, thereby enhancing predictive performance and interpretability for LUAD subtype classification, tumor staging, and prognosis prediction. Multi-omics datasets from The Cancer Genome Atlas (TCGA) were used and results showed that MoAGNN achieved a test accuracy of 0.89 for LUAD subtype classification, outperforming conventional models (Random Forest, Support Vector Machine and Multi-Layer Perceptron) as well as state-of-the-art graph-based models MoGCN, a multi-omics integration model based on graph convolutional network, and MOGLAM, an end-to-end interpretable multi-omics integration method. Furthermore, we validated the generalizability of this framework on the GSE81089 dataset, demonstrating its potential applicability to clinically relevant risk assessment. Subsequent functional enrichment and survival analyses validated the biological relevance of the key genes identified by MoAGNN, supporting their potential roles in LUAD progression, and suggesting the broader applicability of this framework in multi-omics cancer research.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BHMnet: Bayesian high-dimensional mediation analysis with network information integration for correlated mediators. BHMnet：基于网络信息集成的贝叶斯高维中介分析。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf734

Yunju Im, Yuan Huang

We consider identifying a small yet meaningful set of active mediators from a high-dimensional pool of potential mediators, commonly derived from "-omics" or imaging data. In these contexts, mediators are often correlated or exist network structures, which present unique opportunities to improve efficacy by using this valuable information. To this aim, we develop a Bayesian method that accommodates both high dimensionality and correlations among the mediators. Our approach flexibly learns the interconnection between the mediators while improving estimation accuracy by incorporating external knowledge about these relationships. Simulation studies demonstrate the effectiveness of the proposed method compared with alternative approaches. The analysis of the environmental toxicity data provides new insights into the intermediate effects of molecular-level traits.

我们考虑从高维潜在介质池中识别一组小而有意义的活性介质，通常来自“组学”或成像数据。在这些情况下，介质通常是相互关联的或存在网络结构，这为利用这一有价值的信息提高功效提供了独特的机会。为此，我们开发了一种贝叶斯方法，该方法既适应高维度，又适应中介之间的相关性。我们的方法灵活地学习中介之间的互连，同时通过结合有关这些关系的外部知识提高估计精度。仿真研究证明了该方法的有效性。环境毒性数据的分析为分子水平性状的中间效应提供了新的认识。

引用次数: 0

UBD: incorporating uncertainty in cell type proportion estimates from bulk samples to infer cell-type-specific profiles. UBD：从大量样本中纳入细胞类型比例估计的不确定性，以推断细胞类型特异性概况。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf711

Youshu Cheng, Chen Lin, Hongyu Li, Ke Xu, Hongyu Zhao

Statistical deconvolution methods offer a powerful solution for estimating cell-type-specific (CTS) profiles from readily available bulk tissue data. However, a critical limitation of existing methods is that they require the knowledge of cell type proportions of individuals in the bulk data. While the ground truth of cell type proportions in bulk samples are unknown, those methods use the estimated proportions to approximate the truth, which potentially introduces additional uncertainties in the inferred CTS profiles. To address this challenge, we propose Uncertainty-aware Bayesian Deconvolution (UBD) to incorporate uncertainty in cell type proportion estimates. By explicitly modeling the uncertainty in the initial estimates, UBD refines cell type proportions and estimates sample-level CTS data simultaneously. We show that UBD can improve the estimates of CTS profiles through extensive simulations. We further demonstrate the utility of UBD to reveal more CTS signals in its applications to two real datasets.

统计反褶积方法提供了一个强大的解决方案，估计细胞类型特异性（CTS）档案从现成的大块组织数据。然而，现有方法的一个关键限制是，它们需要了解大量数据中个体的细胞类型比例。虽然散装样品中细胞类型比例的基本真相是未知的，但这些方法使用估计的比例来近似真相，这可能会在推断的CTS剖面中引入额外的不确定性。为了解决这一挑战，我们提出了不确定性感知贝叶斯反卷积（UBD），将不确定性纳入细胞类型比例估计。通过明确建模初始估计中的不确定性，UBD精炼细胞类型比例并同时估计样本水平的CTS数据。我们通过广泛的模拟表明，UBD可以改善CTS剖面的估计。我们进一步展示了UBD在两个真实数据集的应用中揭示更多CTS信号的效用。

引用次数: 0

GFSeeker: a splicing-graph-based approach for accurate gene fusion detection from long-read RNA sequencing data. GFSeeker：一种基于剪接图的方法，用于从长读RNA测序数据中精确检测基因融合。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf702

Bingyan Wang, Heng Hu, Runtian Gao, Guohua Wang, Tao Jiang

Gene fusions are critical oncogenic drivers and therapeutic targets in diverse cancers. Long-read ribonucleic acid sequencing (RNA-seq) offers an unprecedented opportunity to resolve the full-length structure of fusion isoforms, but its high intrinsic error rates pose significant challenges to the precise identification of true fusion events. Here, we developed GFSeeker, an innovative splicing-graph-based computational framework for accurate gene fusion detection from long-read RNA-seq. GFSeeker employs a unique pipeline based on a splicing graph reference and a dual re-alignment validation to effectively overcome data noise from high error rates. Benchmarking across simulated, non-tumor, and cancer cell line datasets demonstrated GFSeeker's state-of-the-art performance, achieving 6%-15% higher F1 score compared to existing methods. Notably, GFSeeker successfully identified the known fusion event, MATN2-POP1, in the MCF-7 cancer cell line, missed by other tools, highlighting its superior sensitivity in resolving complex fusion events. These results validate GFSeeker as a powerful and reliable tool for gene fusion discovery, heralding its significant potential to advance cancer research and precision diagnostics.

基因融合是多种癌症的关键致癌驱动因素和治疗靶点。长读核糖核酸测序（RNA-seq）为解决融合异构体的全长结构提供了前所未有的机会，但其高固有错误率对准确识别真正的融合事件构成了重大挑战。在这里，我们开发了GFSeeker，这是一个创新的基于剪接图的计算框架，用于从长读RNA-seq中精确检测基因融合。GFSeeker采用了基于拼接图参考和双重重新对齐验证的独特管道，有效克服了高错误率带来的数据噪声。模拟、非肿瘤和癌细胞系数据集的基准测试表明，GFSeeker具有最先进的性能，与现有方法相比，F1得分提高了6%-15%。值得注意的是，GFSeeker成功地识别了MCF-7癌细胞系中已知的融合事件MATN2-POP1，这是其他工具无法识别的，突出了其在解决复杂融合事件方面的优越敏感性。这些结果验证了GFSeeker是一种强大而可靠的基因融合发现工具，预示着其在推进癌症研究和精确诊断方面的巨大潜力。

{"title":"GFSeeker: a splicing-graph-based approach for accurate gene fusion detection from long-read RNA sequencing data.","authors":"Bingyan Wang, Heng Hu, Runtian Gao, Guohua Wang, Tao Jiang","doi":"10.1093/bib/bbaf702","DOIUrl":"10.1093/bib/bbaf702","url":null,"abstract":"Gene fusions are critical oncogenic drivers and therapeutic targets in diverse cancers. Long-read ribonucleic acid sequencing (RNA-seq) offers an unprecedented opportunity to resolve the full-length structure of fusion isoforms, but its high intrinsic error rates pose significant challenges to the precise identification of true fusion events. Here, we developed GFSeeker, an innovative splicing-graph-based computational framework for accurate gene fusion detection from long-read RNA-seq. GFSeeker employs a unique pipeline based on a splicing graph reference and a dual re-alignment validation to effectively overcome data noise from high error rates. Benchmarking across simulated, non-tumor, and cancer cell line datasets demonstrated GFSeeker's state-of-the-art performance, achieving 6%-15% higher F1 score compared to existing methods. Notably, GFSeeker successfully identified the known fusion event, MATN2-POP1, in the MCF-7 cancer cell line, missed by other tools, highlighting its superior sensitivity in resolving complex fusion events. These results validate GFSeeker as a powerful and reliable tool for gene fusion discovery, heralding its significant potential to advance cancer research and precision diagnostics.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GRAFT: a graph-aware fusion transformer for cancer driver gene prediction. GRAFT：用于癌症驱动基因预测的图形感知融合转换器。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf706

Sang-Pil Cho, Young-Rae Cho

Identifying cancer driver genes is essential for precision oncology, but existing computational methods are often limited by their reliance on single biological networks and their inability to capture long-range molecular dependencies. To address these challenges, we propose GRAFT, a Graph-Aware Fusion Transformer. This framework learns modality-specific features from protein-protein interactions, pathway co-occurrence, and gene semantic similarity using a multi-view graph encoder. These representations are further enriched with two auxiliary feature types: structural encodings derived from network topology and functional embeddings guided by curated gene sets. The integrated features are then processed by a transformer backbone, where a novel edge-attention bias makes the model explicitly sensitive to the underlying graph topologies, enabling the effective modeling of both local and global dependencies. Extensive evaluations demonstrate that GRAFT achieves competitive performance with leading state-of-the-art methods in pan-cancer analysis, while consistently delivering superior predictive accuracy across numerous specific cancer types. More importantly, a functional enrichment analysis of the novel candidate driver genes predicted by our model confirms their strong associations with key cancer-related processes, demonstrating the model's ability to make biologically plausible discoveries. By delivering a powerful and interpretable framework, our model not only advances the identification of cancer driver genes but also establishes a robust paradigm for multimodal data integration in systems biology. The source codes and datasets are publicly accessible at https://github.com/spcho-dev/GRAFT.

确定癌症驱动基因对精确肿瘤学至关重要，但现有的计算方法往往受到其依赖单一生物网络和无法捕获远程分子依赖性的限制。为了解决这些挑战，我们提出了GRAFT，一个图形感知融合变压器。该框架使用多视图图编码器从蛋白质相互作用、途径共发生和基因语义相似性中学习模式特异性特征。这些表示进一步丰富了两种辅助特征类型：来自网络拓扑的结构编码和由精心策划的基因集引导的功能嵌入。然后由变压器主干处理集成的特征，其中新颖的边缘注意偏差使模型显式地对底层图拓扑敏感，从而实现对局部和全局依赖关系的有效建模。广泛的评估表明，GRAFT在泛癌症分析中具有领先的最先进的方法，同时在许多特定癌症类型中始终如一地提供卓越的预测准确性。更重要的是，我们的模型预测的新的候选驱动基因的功能富集分析证实了它们与关键癌症相关过程的强烈关联，证明了该模型有能力做出生物学上合理的发现。通过提供一个强大且可解释的框架，我们的模型不仅推进了癌症驱动基因的识别，而且为系统生物学中的多模态数据集成建立了一个强大的范例。源代码和数据集可在https://github.com/spcho-dev/GRAFT公开访问。

{"title":"GRAFT: a graph-aware fusion transformer for cancer driver gene prediction.","authors":"Sang-Pil Cho, Young-Rae Cho","doi":"10.1093/bib/bbaf706","DOIUrl":"10.1093/bib/bbaf706","url":null,"abstract":"Identifying cancer driver genes is essential for precision oncology, but existing computational methods are often limited by their reliance on single biological networks and their inability to capture long-range molecular dependencies. To address these challenges, we propose GRAFT, a Graph-Aware Fusion Transformer. This framework learns modality-specific features from protein-protein interactions, pathway co-occurrence, and gene semantic similarity using a multi-view graph encoder. These representations are further enriched with two auxiliary feature types: structural encodings derived from network topology and functional embeddings guided by curated gene sets. The integrated features are then processed by a transformer backbone, where a novel edge-attention bias makes the model explicitly sensitive to the underlying graph topologies, enabling the effective modeling of both local and global dependencies. Extensive evaluations demonstrate that GRAFT achieves competitive performance with leading state-of-the-art methods in pan-cancer analysis, while consistently delivering superior predictive accuracy across numerous specific cancer types. More importantly, a functional enrichment analysis of the novel candidate driver genes predicted by our model confirms their strong associations with key cancer-related processes, demonstrating the model's ability to make biologically plausible discoveries. By delivering a powerful and interpretable framework, our model not only advances the identification of cancer driver genes but also establishes a robust paradigm for multimodal data integration in systems biology. The source codes and datasets are publicly accessible at https://github.com/spcho-dev/GRAFT.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0