Charting the eukaryotic epitranscriptome by direct RNA sequencing is promising but still very challenging, as current bioinformatics tools are based on modification-unaware software and require multiple modification-specific learning steps. Here, we introduce NanoSpeech, a modification-aware basecaller for the ab initio simultaneous detection of multiple modified bases using a transformer model, and NanoListener, which implements a simulated randomers strategy for robust training datasets and a new generation of ONT basecallers. NanoListener and NanoSpeech are independent of the specific ONT chemistry. Once a training dataset has been created, a single model with an expanded vocabulary can accurately basecall both unmodified and modified bases.
{"title":"Ab initio detection of multiple epitranscriptomic modifications from Oxford nanopore technology direct RNA sequencing data.","authors":"Adriano Fonzino, Bruno Fosso, Grazia Visci, Carmela Gissi, Graziano Pesole, Ernesto Picardi","doi":"10.1093/bib/bbaf709","DOIUrl":"10.1093/bib/bbaf709","url":null,"abstract":"<p><p>Charting the eukaryotic epitranscriptome by direct RNA sequencing is promising but still very challenging, as current bioinformatics tools are based on modification-unaware software and require multiple modification-specific learning steps. Here, we introduce NanoSpeech, a modification-aware basecaller for the ab initio simultaneous detection of multiple modified bases using a transformer model, and NanoListener, which implements a simulated randomers strategy for robust training datasets and a new generation of ONT basecallers. NanoListener and NanoSpeech are independent of the specific ONT chemistry. Once a training dataset has been created, a single model with an expanded vocabulary can accurately basecall both unmodified and modified bases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12848937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146060091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Can Shi, Yumei Li, Jing Guo, Qiuling Chen, Tingting Cao, Sha Liao, Ao Chen, Mei Li, Ying Zhang
Recent advances in spatial omics technologies have enabled transcriptome profiling at subcellular resolution. By performing cell segmentation on nuclear or membrane staining images, researchers can acquire single cell level spatial gene expression data, which in turn enables subsequent biological interpretation. Although deep learning-based segmentation models achieve high overall accuracy, their performance remains suboptimal for whole-tissue analysis, particularly in ensuring consistent segmentation accuracy across diverse cell populations. Existing fine-tuning approaches often require extensive retraining or are tailored to specific model architectures, limiting their adaptability and scalability in practical settings. To address these challenges, we present CSRefiner, a lightweight and efficient fine-tuning framework for precise whole-tissue single-cell spatial expression analysis. Our approach incorporates support for fine-tuning widely used segmentation models in the field of spatial omics, while achieving high accuracy with very limited annotated data. This study demonstrates CSRefiner's superior performance across various staining types and its compatibility with multiple mainstream models. Combining operational simplicity with robust accuracy, our framework offers a practical solution for real-world spatial transcriptomics applications.
{"title":"CSRefiner: a lightweight framework for fine-tuning cell segmentation models with small datasets.","authors":"Can Shi, Yumei Li, Jing Guo, Qiuling Chen, Tingting Cao, Sha Liao, Ao Chen, Mei Li, Ying Zhang","doi":"10.1093/bib/bbaf718","DOIUrl":"10.1093/bib/bbaf718","url":null,"abstract":"<p><p>Recent advances in spatial omics technologies have enabled transcriptome profiling at subcellular resolution. By performing cell segmentation on nuclear or membrane staining images, researchers can acquire single cell level spatial gene expression data, which in turn enables subsequent biological interpretation. Although deep learning-based segmentation models achieve high overall accuracy, their performance remains suboptimal for whole-tissue analysis, particularly in ensuring consistent segmentation accuracy across diverse cell populations. Existing fine-tuning approaches often require extensive retraining or are tailored to specific model architectures, limiting their adaptability and scalability in practical settings. To address these challenges, we present CSRefiner, a lightweight and efficient fine-tuning framework for precise whole-tissue single-cell spatial expression analysis. Our approach incorporates support for fine-tuning widely used segmentation models in the field of spatial omics, while achieving high accuracy with very limited annotated data. This study demonstrates CSRefiner's superior performance across various staining types and its compatibility with multiple mainstream models. Combining operational simplicity with robust accuracy, our framework offers a practical solution for real-world spatial transcriptomics applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12796817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yonglin Peng, Xinhua Liu, Jun Wu, Sang Lin, Shengxuan Zhan, Hua Li, Ju Wang, Xiaodong Zhao
Transcription factors (TFs) bind to specific sequences in the genome to regulate gene expression and specify cell states. TF binding sites (TFBSs) are cell type-specific, which can be attributed to epigenomic contexts. Comprehensive profiling of TFBSs across various cell types through experimental approaches is neither practical nor cost-friendly. Accurately identifying cell type-specific TFBSs through computational approaches remains challenging. Here, we develop EpiXFormer, a novel transformer-based neural network for cell type-specific TFBS prediction. EpiXFormer achieves exceptional performance in predicting binding sites of DNA-binding proteins (DBPs) across a diverse collection of cell types. It models the effects of proximal and distal epigenomic information on DBP binding and learns the identified motifs of the examined TFs and their potential co-occurring proteins. Moreover, we demonstrate that EpiXFormer can infer pioneer factors during cell type transition and delineate the cell type-specific regulatory functions of TFs. Overall, EpiXFormer enables cell type-specific TFBS prediction in the examined cell lines and is readily applied to other cell types of interest. It provides a robust, scalable framework for characterizing and interpreting multimodal genomic data.
{"title":"EpiXFormer: a cross-attention neural network for predicting cell type-specific transcription factor binding sites.","authors":"Yonglin Peng, Xinhua Liu, Jun Wu, Sang Lin, Shengxuan Zhan, Hua Li, Ju Wang, Xiaodong Zhao","doi":"10.1093/bib/bbaf721","DOIUrl":"10.1093/bib/bbaf721","url":null,"abstract":"<p><p>Transcription factors (TFs) bind to specific sequences in the genome to regulate gene expression and specify cell states. TF binding sites (TFBSs) are cell type-specific, which can be attributed to epigenomic contexts. Comprehensive profiling of TFBSs across various cell types through experimental approaches is neither practical nor cost-friendly. Accurately identifying cell type-specific TFBSs through computational approaches remains challenging. Here, we develop EpiXFormer, a novel transformer-based neural network for cell type-specific TFBS prediction. EpiXFormer achieves exceptional performance in predicting binding sites of DNA-binding proteins (DBPs) across a diverse collection of cell types. It models the effects of proximal and distal epigenomic information on DBP binding and learns the identified motifs of the examined TFs and their potential co-occurring proteins. Moreover, we demonstrate that EpiXFormer can infer pioneer factors during cell type transition and delineate the cell type-specific regulatory functions of TFs. Overall, EpiXFormer enables cell type-specific TFBS prediction in the examined cell lines and is readily applied to other cell types of interest. It provides a robust, scalable framework for characterizing and interpreting multimodal genomic data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12796812/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145958944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The integration of single-cell multi-omics data can uncover the underlying regulatory basis of diverse cell types and states. However, single-cell data inherently suffer from high levels of noise, sparsity, and intercellular heterogeneity, which pose significant challenges to the accuracy and robustness of clustering algorithms. Most existing multi-omics clustering approaches primarily focus on the integration of omics individuality and commonality across modalities, but they ignore the diverse feature extraction of the low-dimensional representation before the fusion of single-cell multi-omics data, and the feature smoothing consistency of the diverse features after the fusion of single-cell multi-omics data. In order to address above issues, we propose a novel multi-subspace contrastive learning with structural smoothness method for single-cell multi-omics data clustering (scMUSCLE), which is designed to address the challenges inherent in multi-omics data integration. First, the proposed scMUSCLE method leverages the degree structure to enhance structural diversity of each omics modality. Second, we perform multi-subspace contrastive learning to improve the diversity exploration across multi-omics features. Next, we propose an adaptive graph convolution clustering module, which establishes an adaptive feedback mechanism between intra-cluster smoothness and the downstream clustering task. Extensive experiments on four benchmark multi-omics datasets demonstrate the effectiveness and robustness. The source code can be found on the GitHub repository: https://github.com/GodIsGad/scMUSCLE.
{"title":"Clustering single-cell multi-omics data via multi-subspace contrastive learning with structural smoothness.","authors":"Yun Ding, Yangzhen Jiang, Jing Wang, Dayu Tan, Yansen Su, Chunhou Zheng","doi":"10.1093/bib/bbag005","DOIUrl":"10.1093/bib/bbag005","url":null,"abstract":"<p><p>The integration of single-cell multi-omics data can uncover the underlying regulatory basis of diverse cell types and states. However, single-cell data inherently suffer from high levels of noise, sparsity, and intercellular heterogeneity, which pose significant challenges to the accuracy and robustness of clustering algorithms. Most existing multi-omics clustering approaches primarily focus on the integration of omics individuality and commonality across modalities, but they ignore the diverse feature extraction of the low-dimensional representation before the fusion of single-cell multi-omics data, and the feature smoothing consistency of the diverse features after the fusion of single-cell multi-omics data. In order to address above issues, we propose a novel multi-subspace contrastive learning with structural smoothness method for single-cell multi-omics data clustering (scMUSCLE), which is designed to address the challenges inherent in multi-omics data integration. First, the proposed scMUSCLE method leverages the degree structure to enhance structural diversity of each omics modality. Second, we perform multi-subspace contrastive learning to improve the diversity exploration across multi-omics features. Next, we propose an adaptive graph convolution clustering module, which establishes an adaptive feedback mechanism between intra-cluster smoothness and the downstream clustering task. Extensive experiments on four benchmark multi-omics datasets demonstrate the effectiveness and robustness. The source code can be found on the GitHub repository: https://github.com/GodIsGad/scMUSCLE.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834668/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146050305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Wu, Md Wahiduzzaman, Pengfei Yin, Puxuan Sun, Haoping Chen, Yongwen Ding, Jiankang Wang
Histone modifications (HMs) and transcription factors (TFs) are central to chromatin dynamics and transcriptional regulation. Conventional bulk approaches like ChIP-seq require large cell populations, limiting applicability to heterogeneous studies and tissue samples. In contrast, single-cell cleavage under targets and tagmentation (scCUT&Tag) and its variants have enabled high-resolution profiling of HMs and TFs for investigating gene regulatory mechanisms in individual cells, transformatively broadening single-cell epigenomics beyond chromatin accessibility measured by scATAC-seq. Despite rapid advances in scCUT&Tag-related methods and the accumulation of ~21 public datasets, a systematic overview of the current research status, especially the forefront of computational analysis and ensuing challenges, remains lacking. Here, we comprehensively overview current scCUT&Tag studies from a bioinformatics perspective. We catalog representative applications spanning diverse chromatin features, experimental designs, and data characteristics. We delineate a typical computational workflow from matrix generation to downstream functional annotations, emphasizing distinctions from scATAC-seq analysis, and highlighting critical analytical considerations. We extensively survey commonly used computational tools and key algorithms, compare analytical features between scCUT&Tag and scATAC-seq, and discuss major challenges in integrative analysis. This work provides a structured reference for understanding the current research landscape of scCUT&Tag and offers computational perspectives for researchers aiming to explore gene regulatory machinery at single-cell resolution.
{"title":"Advances in scCUT&Tag and computational analysis for single-cell gene regulatory element mapping.","authors":"Jun Wu, Md Wahiduzzaman, Pengfei Yin, Puxuan Sun, Haoping Chen, Yongwen Ding, Jiankang Wang","doi":"10.1093/bib/bbag015","DOIUrl":"10.1093/bib/bbag015","url":null,"abstract":"<p><p>Histone modifications (HMs) and transcription factors (TFs) are central to chromatin dynamics and transcriptional regulation. Conventional bulk approaches like ChIP-seq require large cell populations, limiting applicability to heterogeneous studies and tissue samples. In contrast, single-cell cleavage under targets and tagmentation (scCUT&Tag) and its variants have enabled high-resolution profiling of HMs and TFs for investigating gene regulatory mechanisms in individual cells, transformatively broadening single-cell epigenomics beyond chromatin accessibility measured by scATAC-seq. Despite rapid advances in scCUT&Tag-related methods and the accumulation of ~21 public datasets, a systematic overview of the current research status, especially the forefront of computational analysis and ensuing challenges, remains lacking. Here, we comprehensively overview current scCUT&Tag studies from a bioinformatics perspective. We catalog representative applications spanning diverse chromatin features, experimental designs, and data characteristics. We delineate a typical computational workflow from matrix generation to downstream functional annotations, emphasizing distinctions from scATAC-seq analysis, and highlighting critical analytical considerations. We extensively survey commonly used computational tools and key algorithms, compare analytical features between scCUT&Tag and scATAC-seq, and discuss major challenges in integrative analysis. This work provides a structured reference for understanding the current research landscape of scCUT&Tag and offers computational perspectives for researchers aiming to explore gene regulatory machinery at single-cell resolution.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853305/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cheng-Pei Lin, Yann-Jen Ho, Yen-Peng Chiu, Yun Tang, You Sheng Paik, Guan-Ting Chen, Wei-Chih Huang, Tzong-Yi Lee
Lung adenocarcinoma (LUAD), the most common subtype of nonsmall cell lung cancer, exhibits substantial molecular heterogeneity, complicating subtype classification, progression assessment, and treatment decision-making. Advances in high-throughput sequencing enable multi-omics analysis to reveal cancer mechanisms and biomarkers, yet the high dimensionality, heterogeneity, and interrelationships of omics layers such as transcriptome, microRNA expression, methylome, and copy number variation remain challenging to integrate through conventional methods. Most existing graph-based approaches represent patients as nodes, obscuring gene-level regulatory dynamics and limiting biological interpretability. To address this, we propose the Multi-omics Hierarchical Graph Neural Network (MoAGNN), a novel architecture that represents genes as nodes, integrates four omics, and leverages graph convolution with self-attention-based graph pooling to identify informative molecular nodes, thereby enhancing predictive performance and interpretability for LUAD subtype classification, tumor staging, and prognosis prediction. Multi-omics datasets from The Cancer Genome Atlas (TCGA) were used and results showed that MoAGNN achieved a test accuracy of 0.89 for LUAD subtype classification, outperforming conventional models (Random Forest, Support Vector Machine and Multi-Layer Perceptron) as well as state-of-the-art graph-based models MoGCN, a multi-omics integration model based on graph convolutional network, and MOGLAM, an end-to-end interpretable multi-omics integration method. Furthermore, we validated the generalizability of this framework on the GSE81089 dataset, demonstrating its potential applicability to clinically relevant risk assessment. Subsequent functional enrichment and survival analyses validated the biological relevance of the key genes identified by MoAGNN, supporting their potential roles in LUAD progression, and suggesting the broader applicability of this framework in multi-omics cancer research.
{"title":"MoAGNN: a multi-omics hierarchical graph neural network for subtype classification and prognosis prediction in lung adenocarcinoma.","authors":"Cheng-Pei Lin, Yann-Jen Ho, Yen-Peng Chiu, Yun Tang, You Sheng Paik, Guan-Ting Chen, Wei-Chih Huang, Tzong-Yi Lee","doi":"10.1093/bib/bbaf735","DOIUrl":"10.1093/bib/bbaf735","url":null,"abstract":"<p><p>Lung adenocarcinoma (LUAD), the most common subtype of nonsmall cell lung cancer, exhibits substantial molecular heterogeneity, complicating subtype classification, progression assessment, and treatment decision-making. Advances in high-throughput sequencing enable multi-omics analysis to reveal cancer mechanisms and biomarkers, yet the high dimensionality, heterogeneity, and interrelationships of omics layers such as transcriptome, microRNA expression, methylome, and copy number variation remain challenging to integrate through conventional methods. Most existing graph-based approaches represent patients as nodes, obscuring gene-level regulatory dynamics and limiting biological interpretability. To address this, we propose the Multi-omics Hierarchical Graph Neural Network (MoAGNN), a novel architecture that represents genes as nodes, integrates four omics, and leverages graph convolution with self-attention-based graph pooling to identify informative molecular nodes, thereby enhancing predictive performance and interpretability for LUAD subtype classification, tumor staging, and prognosis prediction. Multi-omics datasets from The Cancer Genome Atlas (TCGA) were used and results showed that MoAGNN achieved a test accuracy of 0.89 for LUAD subtype classification, outperforming conventional models (Random Forest, Support Vector Machine and Multi-Layer Perceptron) as well as state-of-the-art graph-based models MoGCN, a multi-omics integration model based on graph convolutional network, and MOGLAM, an end-to-end interpretable multi-omics integration method. Furthermore, we validated the generalizability of this framework on the GSE81089 dataset, demonstrating its potential applicability to clinically relevant risk assessment. Subsequent functional enrichment and survival analyses validated the biological relevance of the key genes identified by MoAGNN, supporting their potential roles in LUAD progression, and suggesting the broader applicability of this framework in multi-omics cancer research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider identifying a small yet meaningful set of active mediators from a high-dimensional pool of potential mediators, commonly derived from "-omics" or imaging data. In these contexts, mediators are often correlated or exist network structures, which present unique opportunities to improve efficacy by using this valuable information. To this aim, we develop a Bayesian method that accommodates both high dimensionality and correlations among the mediators. Our approach flexibly learns the interconnection between the mediators while improving estimation accuracy by incorporating external knowledge about these relationships. Simulation studies demonstrate the effectiveness of the proposed method compared with alternative approaches. The analysis of the environmental toxicity data provides new insights into the intermediate effects of molecular-level traits.
{"title":"BHMnet: Bayesian high-dimensional mediation analysis with network information integration for correlated mediators.","authors":"Yunju Im, Yuan Huang","doi":"10.1093/bib/bbaf734","DOIUrl":"10.1093/bib/bbaf734","url":null,"abstract":"<p><p>We consider identifying a small yet meaningful set of active mediators from a high-dimensional pool of potential mediators, commonly derived from \"-omics\" or imaging data. In these contexts, mediators are often correlated or exist network structures, which present unique opportunities to improve efficacy by using this valuable information. To this aim, we develop a Bayesian method that accommodates both high dimensionality and correlations among the mediators. Our approach flexibly learns the interconnection between the mediators while improving estimation accuracy by incorporating external knowledge about these relationships. Simulation studies demonstrate the effectiveness of the proposed method compared with alternative approaches. The analysis of the environmental toxicity data provides new insights into the intermediate effects of molecular-level traits.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814977/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youshu Cheng, Chen Lin, Hongyu Li, Ke Xu, Hongyu Zhao
Statistical deconvolution methods offer a powerful solution for estimating cell-type-specific (CTS) profiles from readily available bulk tissue data. However, a critical limitation of existing methods is that they require the knowledge of cell type proportions of individuals in the bulk data. While the ground truth of cell type proportions in bulk samples are unknown, those methods use the estimated proportions to approximate the truth, which potentially introduces additional uncertainties in the inferred CTS profiles. To address this challenge, we propose Uncertainty-aware Bayesian Deconvolution (UBD) to incorporate uncertainty in cell type proportion estimates. By explicitly modeling the uncertainty in the initial estimates, UBD refines cell type proportions and estimates sample-level CTS data simultaneously. We show that UBD can improve the estimates of CTS profiles through extensive simulations. We further demonstrate the utility of UBD to reveal more CTS signals in its applications to two real datasets.
{"title":"UBD: incorporating uncertainty in cell type proportion estimates from bulk samples to infer cell-type-specific profiles.","authors":"Youshu Cheng, Chen Lin, Hongyu Li, Ke Xu, Hongyu Zhao","doi":"10.1093/bib/bbaf711","DOIUrl":"10.1093/bib/bbaf711","url":null,"abstract":"<p><p>Statistical deconvolution methods offer a powerful solution for estimating cell-type-specific (CTS) profiles from readily available bulk tissue data. However, a critical limitation of existing methods is that they require the knowledge of cell type proportions of individuals in the bulk data. While the ground truth of cell type proportions in bulk samples are unknown, those methods use the estimated proportions to approximate the truth, which potentially introduces additional uncertainties in the inferred CTS profiles. To address this challenge, we propose Uncertainty-aware Bayesian Deconvolution (UBD) to incorporate uncertainty in cell type proportion estimates. By explicitly modeling the uncertainty in the initial estimates, UBD refines cell type proportions and estimates sample-level CTS data simultaneously. We show that UBD can improve the estimates of CTS profiles through extensive simulations. We further demonstrate the utility of UBD to reveal more CTS signals in its applications to two real datasets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bingyan Wang, Heng Hu, Runtian Gao, Guohua Wang, Tao Jiang
Gene fusions are critical oncogenic drivers and therapeutic targets in diverse cancers. Long-read ribonucleic acid sequencing (RNA-seq) offers an unprecedented opportunity to resolve the full-length structure of fusion isoforms, but its high intrinsic error rates pose significant challenges to the precise identification of true fusion events. Here, we developed GFSeeker, an innovative splicing-graph-based computational framework for accurate gene fusion detection from long-read RNA-seq. GFSeeker employs a unique pipeline based on a splicing graph reference and a dual re-alignment validation to effectively overcome data noise from high error rates. Benchmarking across simulated, non-tumor, and cancer cell line datasets demonstrated GFSeeker's state-of-the-art performance, achieving 6%-15% higher F1 score compared to existing methods. Notably, GFSeeker successfully identified the known fusion event, MATN2-POP1, in the MCF-7 cancer cell line, missed by other tools, highlighting its superior sensitivity in resolving complex fusion events. These results validate GFSeeker as a powerful and reliable tool for gene fusion discovery, heralding its significant potential to advance cancer research and precision diagnostics.
{"title":"GFSeeker: a splicing-graph-based approach for accurate gene fusion detection from long-read RNA sequencing data.","authors":"Bingyan Wang, Heng Hu, Runtian Gao, Guohua Wang, Tao Jiang","doi":"10.1093/bib/bbaf702","DOIUrl":"10.1093/bib/bbaf702","url":null,"abstract":"<p><p>Gene fusions are critical oncogenic drivers and therapeutic targets in diverse cancers. Long-read ribonucleic acid sequencing (RNA-seq) offers an unprecedented opportunity to resolve the full-length structure of fusion isoforms, but its high intrinsic error rates pose significant challenges to the precise identification of true fusion events. Here, we developed GFSeeker, an innovative splicing-graph-based computational framework for accurate gene fusion detection from long-read RNA-seq. GFSeeker employs a unique pipeline based on a splicing graph reference and a dual re-alignment validation to effectively overcome data noise from high error rates. Benchmarking across simulated, non-tumor, and cancer cell line datasets demonstrated GFSeeker's state-of-the-art performance, achieving 6%-15% higher F1 score compared to existing methods. Notably, GFSeeker successfully identified the known fusion event, MATN2-POP1, in the MCF-7 cancer cell line, missed by other tools, highlighting its superior sensitivity in resolving complex fusion events. These results validate GFSeeker as a powerful and reliable tool for gene fusion discovery, heralding its significant potential to advance cancer research and precision diagnostics.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying cancer driver genes is essential for precision oncology, but existing computational methods are often limited by their reliance on single biological networks and their inability to capture long-range molecular dependencies. To address these challenges, we propose GRAFT, a Graph-Aware Fusion Transformer. This framework learns modality-specific features from protein-protein interactions, pathway co-occurrence, and gene semantic similarity using a multi-view graph encoder. These representations are further enriched with two auxiliary feature types: structural encodings derived from network topology and functional embeddings guided by curated gene sets. The integrated features are then processed by a transformer backbone, where a novel edge-attention bias makes the model explicitly sensitive to the underlying graph topologies, enabling the effective modeling of both local and global dependencies. Extensive evaluations demonstrate that GRAFT achieves competitive performance with leading state-of-the-art methods in pan-cancer analysis, while consistently delivering superior predictive accuracy across numerous specific cancer types. More importantly, a functional enrichment analysis of the novel candidate driver genes predicted by our model confirms their strong associations with key cancer-related processes, demonstrating the model's ability to make biologically plausible discoveries. By delivering a powerful and interpretable framework, our model not only advances the identification of cancer driver genes but also establishes a robust paradigm for multimodal data integration in systems biology. The source codes and datasets are publicly accessible at https://github.com/spcho-dev/GRAFT.
{"title":"GRAFT: a graph-aware fusion transformer for cancer driver gene prediction.","authors":"Sang-Pil Cho, Young-Rae Cho","doi":"10.1093/bib/bbaf706","DOIUrl":"10.1093/bib/bbaf706","url":null,"abstract":"<p><p>Identifying cancer driver genes is essential for precision oncology, but existing computational methods are often limited by their reliance on single biological networks and their inability to capture long-range molecular dependencies. To address these challenges, we propose GRAFT, a Graph-Aware Fusion Transformer. This framework learns modality-specific features from protein-protein interactions, pathway co-occurrence, and gene semantic similarity using a multi-view graph encoder. These representations are further enriched with two auxiliary feature types: structural encodings derived from network topology and functional embeddings guided by curated gene sets. The integrated features are then processed by a transformer backbone, where a novel edge-attention bias makes the model explicitly sensitive to the underlying graph topologies, enabling the effective modeling of both local and global dependencies. Extensive evaluations demonstrate that GRAFT achieves competitive performance with leading state-of-the-art methods in pan-cancer analysis, while consistently delivering superior predictive accuracy across numerous specific cancer types. More importantly, a functional enrichment analysis of the novel candidate driver genes predicted by our model confirms their strong associations with key cancer-related processes, demonstrating the model's ability to make biologically plausible discoveries. By delivering a powerful and interpretable framework, our model not only advances the identification of cancer driver genes but also establishes a robust paradigm for multimodal data integration in systems biology. The source codes and datasets are publicly accessible at https://github.com/spcho-dev/GRAFT.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790624/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}