首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
Detection of alternative splicing: deep sequencing or deep learning? 选择性剪接检测:深度测序还是深度学习?
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf705
Lena Maria Hackl, Fabian Neuhaus, Sabine Ameling, Uwe Völker, Jan Baumbach, Olga Tsoy

Alternative splicing is a crucial mechanism of gene regulation that enables condition- and tissue-specific expression of gene isoforms. Its dysregulation plays a role in various diseases such as cancer, neurological disorders, and metabolic conditions. Despite its importance, accurate detection of alternative splicing events remains challenging. Comprehensive alternative splicing event detection typically requires deep sequencing with over 100 million reads; however, much of the publicly accessible RNA sequencing data is of lower sequencing depth. Recent advances, particularly deep learning models working with genomic sequences, offer new avenues for predicting alternative splicing without reliance on high sequencing depth data. Our study addresses the question: Can we utilize the vast repository of publicly available RNA sequencing data for comprehensive alternative splicing detection, despite the low sequencing depth? Our results demonstrate the potential of sequence-based deep learning tools such as AlphaGenome, SpliceAI and DeepSplice for initial hypothesis development and as additional filters in standard RNA sequencing pipelines, especially when sequencing depth is limited. Nonetheless, validation with higher sequencing depths remains essential for confirmation of splice events. Overall, our findings underscore the need for integrative methods combining genomic sequence data and RNA sequencing data for the prediction of tissue- and condition-specific alternative splicing in resource-limited settings.

选择性剪接是基因调控的一个关键机制,它使基因同种异构体的条件和组织特异性表达成为可能。它的失调在各种疾病如癌症、神经系统疾病和代谢疾病中起作用。尽管它很重要,但准确检测选择性剪接事件仍然具有挑战性。全面的选择性剪接事件检测通常需要超过1亿reads的深度测序;然而,许多可公开访问的RNA测序数据的测序深度较低。最近的进展,特别是与基因组序列一起工作的深度学习模型,为预测选择性剪接提供了新的途径,而不依赖于高测序深度数据。我们的研究解决了这样一个问题:尽管测序深度较低,但我们能否利用大量公开可用的RNA测序数据进行全面的替代剪接检测?我们的研究结果证明了基于序列的深度学习工具(如AlphaGenome、SpliceAI和DeepSplice)在初始假设开发和标准RNA测序管道中的附加过滤器方面的潜力,特别是在测序深度有限的情况下。尽管如此,更高测序深度的验证仍然是确认剪接事件的必要条件。总的来说,我们的研究结果强调了在资源有限的环境下,需要将基因组序列数据和RNA测序数据结合起来的综合方法来预测组织和条件特异性的选择性剪接。
{"title":"Detection of alternative splicing: deep sequencing or deep learning?","authors":"Lena Maria Hackl, Fabian Neuhaus, Sabine Ameling, Uwe Völker, Jan Baumbach, Olga Tsoy","doi":"10.1093/bib/bbaf705","DOIUrl":"10.1093/bib/bbaf705","url":null,"abstract":"<p><p>Alternative splicing is a crucial mechanism of gene regulation that enables condition- and tissue-specific expression of gene isoforms. Its dysregulation plays a role in various diseases such as cancer, neurological disorders, and metabolic conditions. Despite its importance, accurate detection of alternative splicing events remains challenging. Comprehensive alternative splicing event detection typically requires deep sequencing with over 100 million reads; however, much of the publicly accessible RNA sequencing data is of lower sequencing depth. Recent advances, particularly deep learning models working with genomic sequences, offer new avenues for predicting alternative splicing without reliance on high sequencing depth data. Our study addresses the question: Can we utilize the vast repository of publicly available RNA sequencing data for comprehensive alternative splicing detection, despite the low sequencing depth? Our results demonstrate the potential of sequence-based deep learning tools such as AlphaGenome, SpliceAI and DeepSplice for initial hypothesis development and as additional filters in standard RNA sequencing pipelines, especially when sequencing depth is limited. Nonetheless, validation with higher sequencing depths remains essential for confirmation of splice events. Overall, our findings underscore the need for integrative methods combining genomic sequence data and RNA sequencing data for the prediction of tissue- and condition-specific alternative splicing in resource-limited settings.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12790623/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145948453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-ancestry information transfer framework improves protein abundance prediction and protein-trait association identification. 跨祖先信息传递框架改进了蛋白质丰度预测和蛋白质性状关联鉴定。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf707
Wenli Zhai, Lingyun Sun, Wenwei Fang, Yidan Dong, Chunxiao Cheng, Yuanjiao Liu, Yuan Zhou, Jiadong Ji, Lang Wu, An Pan, Eric R Gamazon, Xiong-Fei Pan, Dan Zhou

Genetics-informed proteome-wide association studies (PWASs) provide an effective way to uncover proteomic mechanisms underlying complex diseases. PWAS relies on an ancestry-matched reference panel to model the impact of genetically determined protein expression on phenotype. However, reference panels from underrepresented populations remain relatively limited. We developed a multi-ancestry framework to enhance protein prediction in these populations by integrating diverse information-sharing strategies into a Multi-Ancestry Best-performing Model (MABM). Results indicated that MABM increased the prediction performance with higher performance observed in both cross-validation and an external dataset. Leveraging the Biobank Japan, we identified three times as many significant PWAS associations using MABM as using Lasso model. Notably, 47.5% of the MABM specific associations were reproduced in independent East Asian datasets with concordant effect sizes. Furthermore, MABM enhanced decision-making in gene/protein prioritization for functional validation for complex traits by validating well-established associations and uncovering novel trait-related candidates. The benefits of MABM were further validated in additional ancestries and demonstrated in brain tissue-based PWAS, underscoring its broad applicability. Our findings close critical gaps in multi-omics research among underrepresented populations and facilitate trait-relevant protein discovery in underrepresented populations.

遗传信息蛋白质组关联研究(PWASs)为揭示复杂疾病背后的蛋白质组机制提供了一种有效的方法。PWAS依赖于一个祖先匹配的参考面板来模拟基因决定的蛋白质表达对表型的影响。然而,来自代表性不足人口的参考小组仍然相对有限。我们开发了一个多祖先框架,通过将不同的信息共享策略整合到多祖先最佳表现模型(MABM)中来增强这些人群的蛋白质预测。结果表明,MABM提高了预测性能,在交叉验证和外部数据集中都观察到更高的性能。利用日本生物银行,我们发现使用MABM的PWAS关联是使用Lasso模型的三倍。值得注意的是,47.5%的MABM特异性关联在具有一致效应量的独立东亚数据集中重现。此外,MABM通过验证已建立的关联和发现新的性状相关候选者,增强了复杂性状功能验证中基因/蛋白优先级的决策。MABM的益处在其他祖先中得到进一步验证,并在基于脑组织的PWAS中得到证实,强调了其广泛的适用性。我们的研究结果填补了代表性不足人群中多组学研究的关键空白,并促进了代表性不足人群中性状相关蛋白的发现。
{"title":"Cross-ancestry information transfer framework improves protein abundance prediction and protein-trait association identification.","authors":"Wenli Zhai, Lingyun Sun, Wenwei Fang, Yidan Dong, Chunxiao Cheng, Yuanjiao Liu, Yuan Zhou, Jiadong Ji, Lang Wu, An Pan, Eric R Gamazon, Xiong-Fei Pan, Dan Zhou","doi":"10.1093/bib/bbaf707","DOIUrl":"10.1093/bib/bbaf707","url":null,"abstract":"<p><p>Genetics-informed proteome-wide association studies (PWASs) provide an effective way to uncover proteomic mechanisms underlying complex diseases. PWAS relies on an ancestry-matched reference panel to model the impact of genetically determined protein expression on phenotype. However, reference panels from underrepresented populations remain relatively limited. We developed a multi-ancestry framework to enhance protein prediction in these populations by integrating diverse information-sharing strategies into a Multi-Ancestry Best-performing Model (MABM). Results indicated that MABM increased the prediction performance with higher performance observed in both cross-validation and an external dataset. Leveraging the Biobank Japan, we identified three times as many significant PWAS associations using MABM as using Lasso model. Notably, 47.5% of the MABM specific associations were reproduced in independent East Asian datasets with concordant effect sizes. Furthermore, MABM enhanced decision-making in gene/protein prioritization for functional validation for complex traits by validating well-established associations and uncovering novel trait-related candidates. The benefits of MABM were further validated in additional ancestries and demonstrated in brain tissue-based PWAS, underscoring its broad applicability. Our findings close critical gaps in multi-omics research among underrepresented populations and facilitate trait-relevant protein discovery in underrepresented populations.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777707/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iceDP: identifying inter-chromatin engagement via density peaks clustering algorithm. iceDP:通过密度峰聚类算法识别染色质间接合。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf704
Ruhai Chen, Jiekai Chen, Lingling Shi, Jiangping He

Chromatin topological structure is critical for gene regulation. Hi-C based experiments have significantly advanced our understanding chromatin organization. Numerous computational tools have been developed to identify various structural levels of chromatin, ranging from compartments to loops. However, there remains a lack of specialized tools for identifying non-homologous inter-chromatin contacts (NHCCs), which play important roles in chromosome territories. In this study, we present iceDP, a tool that leverages the Density Peaks clustering algorithm to identify local high-density regions within inter-chromatin. These regions undergo two subsequent filtering steps to eliminate obvious false positives. When applied to three Hi-C datasets, iceDP accurately identified known NHCCs, including olfactory receptor genes in mature olfactory sensory neurons and Polycomb repressive complex-regulated developmental genes in mouse embryonic stem cells (mESCs). Notably, iceDP also uncovered previously unreported transcriptionally active NHCCs. Compared to diffHiC and FitHiC, iceDP exhibited superior performance with the highest positive rate. Moreover, iceDP is compatible with a wide range of chromatin conformation capture techniques, including in-situ Hi-C, Micro-C, HiChIP, and BL-HiC, demonstrating its versatility and utility.

染色质拓扑结构对基因调控至关重要。基于Hi-C的实验极大地促进了我们对染色质组织的理解。已经开发了许多计算工具来识别染色质的各种结构水平,从隔室到环。然而,仍然缺乏专门的工具来识别非同源染色质间接触(nhcc),它在染色体区域中起着重要作用。在这项研究中,我们提出了iceDP,一个利用密度峰聚类算法来识别染色质间局部高密度区域的工具。这些区域经过两个后续的过滤步骤,以消除明显的误报。当应用于三个high - c数据集时,iceDP准确地鉴定了已知的nhcc,包括成熟嗅觉感觉神经元中的嗅觉受体基因和小鼠胚胎干细胞(mESCs)中的Polycomb抑制复合物调节的发育基因。值得注意的是,iceDP还发现了以前未报道的转录活性nhcc。与diffHiC和FitHiC相比,iceDP表现出更好的性能,阳性率最高。此外,iceDP与广泛的染色质构象捕获技术兼容,包括原位Hi-C、Micro-C、HiChIP和bl - hc,显示了其通用性和实用性。
{"title":"iceDP: identifying inter-chromatin engagement via density peaks clustering algorithm.","authors":"Ruhai Chen, Jiekai Chen, Lingling Shi, Jiangping He","doi":"10.1093/bib/bbaf704","DOIUrl":"10.1093/bib/bbaf704","url":null,"abstract":"<p><p>Chromatin topological structure is critical for gene regulation. Hi-C based experiments have significantly advanced our understanding chromatin organization. Numerous computational tools have been developed to identify various structural levels of chromatin, ranging from compartments to loops. However, there remains a lack of specialized tools for identifying non-homologous inter-chromatin contacts (NHCCs), which play important roles in chromosome territories. In this study, we present iceDP, a tool that leverages the Density Peaks clustering algorithm to identify local high-density regions within inter-chromatin. These regions undergo two subsequent filtering steps to eliminate obvious false positives. When applied to three Hi-C datasets, iceDP accurately identified known NHCCs, including olfactory receptor genes in mature olfactory sensory neurons and Polycomb repressive complex-regulated developmental genes in mouse embryonic stem cells (mESCs). Notably, iceDP also uncovered previously unreported transcriptionally active NHCCs. Compared to diffHiC and FitHiC, iceDP exhibited superior performance with the highest positive rate. Moreover, iceDP is compatible with a wide range of chromatin conformation capture techniques, including in-situ Hi-C, Micro-C, HiChIP, and BL-HiC, demonstrating its versatility and utility.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CircRM: profiling circular RNA modifications from nanopore direct RNA sequencing. CircRM:从纳米孔直接RNA测序分析环状RNA修饰。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf726
Jiayi Li, Shenglun Chen, Zhixing Wu, Haozhe Wang, Rong Xia, Jia Meng, Yuxin Zhang

Circular RNA (circRNA) represents a critical class of regulatory RNAs with distinctive structural and functional features. The functions of circRNAs are modulated by various RNA modifications. Here, we present CircRM, a nanopore direct RNA sequencing-based computational method for profiling RNA modifications in circRNAs at single-base and single-molecule resolution. By integrating circRNA detection, read-level modification detection, and quantitative assessment of methylation rates, CircRM identified 427 high-confidence circRNAs and enables systematic characterization of three major modifications, m5C (AUC = 0.855), m6A (AUC = 0.817) and m1A (AUC = 0.769). It revealed distinct modification patterns compared with linear RNAs, highlighting RNA-type-specific regulations. We also identified the key features of circRNA-specific modifications, such as the enrichment near the back-splice junctions. Cross-cell line analyses further demonstrated conserved and cell-type-specific modification patterns. Together, these findings reveal, at the computational level, a unique epitranscriptomic landscape associated with circRNAs and establish CircRM as a powerful tool for advancing the study of RNA modifications in circular RNA biology. CircRM is free accessible at: https://github.com/jiayiAnnie17/CircRM.

环状RNA (circRNA)是一类具有独特结构和功能特征的关键调控RNA。环状RNA的功能受到各种RNA修饰的调节。在这里,我们提出了CircRM,一种基于纳米孔直接RNA测序的计算方法,用于在单碱基和单分子分辨率下分析circRNAs中的RNA修饰。通过整合circRNA检测、读级修饰检测和甲基化率定量评估,CircRM鉴定出427个高置信度的circRNA,并能够系统表征三种主要修饰,m5C (AUC = 0.855)、m6A (AUC = 0.817)和m1A (AUC = 0.769)。与线性rna相比,它揭示了不同的修饰模式,突出了rna类型特异性调控。我们还确定了circrna特异性修饰的关键特征,例如后剪接连接处附近的富集。跨细胞系分析进一步证明了保守的和细胞类型特异性的修饰模式。总之,这些发现在计算水平上揭示了与环状RNA相关的独特的表转录组学景观,并使CircRM成为推进环状RNA生物学中RNA修饰研究的有力工具。CircRM可以免费访问:https://github.com/jiayiAnnie17/CircRM。
{"title":"CircRM: profiling circular RNA modifications from nanopore direct RNA sequencing.","authors":"Jiayi Li, Shenglun Chen, Zhixing Wu, Haozhe Wang, Rong Xia, Jia Meng, Yuxin Zhang","doi":"10.1093/bib/bbaf726","DOIUrl":"10.1093/bib/bbaf726","url":null,"abstract":"<p><p>Circular RNA (circRNA) represents a critical class of regulatory RNAs with distinctive structural and functional features. The functions of circRNAs are modulated by various RNA modifications. Here, we present CircRM, a nanopore direct RNA sequencing-based computational method for profiling RNA modifications in circRNAs at single-base and single-molecule resolution. By integrating circRNA detection, read-level modification detection, and quantitative assessment of methylation rates, CircRM identified 427 high-confidence circRNAs and enables systematic characterization of three major modifications, m5C (AUC = 0.855), m6A (AUC = 0.817) and m1A (AUC = 0.769). It revealed distinct modification patterns compared with linear RNAs, highlighting RNA-type-specific regulations. We also identified the key features of circRNA-specific modifications, such as the enrichment near the back-splice junctions. Cross-cell line analyses further demonstrated conserved and cell-type-specific modification patterns. Together, these findings reveal, at the computational level, a unique epitranscriptomic landscape associated with circRNAs and establish CircRM as a powerful tool for advancing the study of RNA modifications in circular RNA biology. CircRM is free accessible at: https://github.com/jiayiAnnie17/CircRM.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145965377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel two-sample Mendelian randomization framework integrating common and rare variants: application to assess the effect of HDL-C on preeclampsia risk. 整合常见和罕见变异的新型双样本孟德尔随机化框架:用于评估HDL-C对子痫前期风险的影响。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf649
Yu Zhang, Ming Li, David M Haas, C Noel Bairey Merz, Tsegaselassie Workalemahu, Kelli Ryckman, Janet M Catov, Lisa D Levine, Alexa Freedman, George R Saade, Jiaqi Hu, Hongyu Zhao, Xihao Li, Nianjun Liu, Qi Yan

Mendelian randomization (MR) has become an important technique for establishing causal relationships between risk factors and health outcomes. By using genetic variants as instrumental variables, it can mitigate bias due to confounding and reverse causation in observational studies. Current MR analyses have predominantly used common genetic variants as instruments, which represent only part of the genetic architecture of complex traits. Rare variants, which can have larger effect sizes and provide unique biological insights, have been understudied due to statistical and methodological challenges. We introduce MR-common and annotation-informed rare variants (MR-CARV), a novel framework integrating common and rare genetic variants in two-sample MR. This method leverages comprehensive genetic data made available by high-throughput sequencing technologies and large-scale consortia. Rare variants are aggregated into functional categories, such as gene-coding, gene-noncoding, and nongene regions, by leveraging variant annotations and biological impact as weights. The effects of rare variant sets are then estimated with STAARpipeline and combined with the estimated effects of common variants by the existing MR methods. Simulation studies demonstrate that MR-CARV maintains robust type I error and achieves higher statistical power, with up to a 66.3% relative increase compared with existing methods only based on common variants. Consistent with these findings, application to real data on high-density lipoprotein cholesterol (HDL-C) and preeclampsia showed that MR-CARV [inverse variance weighted (IVW)] yielded a more precise and statistically significant effect estimate (-0.020, SE = 0.0102, $P$ =.0470) than IVW using only common variants (-0.023, SE = 0.0123, $P$ =.0659).

孟德尔随机化(MR)已成为建立危险因素与健康结果之间因果关系的重要技术。通过使用遗传变异作为工具变量,可以减轻观察性研究中由于混杂和反向因果关系而产生的偏倚。目前的MR分析主要使用常见的遗传变异作为工具,这只代表了复杂性状的部分遗传结构。由于统计和方法上的挑战,罕见的变异,可以有更大的效应大小和提供独特的生物学见解,一直没有得到充分的研究。我们介绍了MR-common和annotation-informed rare variant (MR-CARV),这是一种在两样本mr中整合常见和罕见遗传变异的新框架。这种方法利用了高通量测序技术和大规模联盟提供的全面遗传数据。通过利用变异注释和生物影响作为权重,将罕见的变异聚合到功能类别中,例如基因编码、基因非编码和非基因区域。然后利用STAARpipeline估计罕见变异集的影响,并结合现有MR方法估计常见变异集的影响。仿真研究表明,MR-CARV保持了鲁棒的I型误差,并获得了更高的统计功率,与仅基于常见变量的现有方法相比,相对提高了66.3%。与这些发现一致的是,将MR-CARV[逆方差加权(IVW)]应用于高密度脂蛋白胆固醇(HDL-C)和子痫前期的真实数据显示,MR-CARV[逆方差加权(IVW)]比仅使用常见变异的IVW (-0.023, SE = 0.0123, P$ = 0.059)产生了更精确且具有统计学意义的效应估计(-0.020,SE = 0.0102, $P$ = 0.0470)。
{"title":"A novel two-sample Mendelian randomization framework integrating common and rare variants: application to assess the effect of HDL-C on preeclampsia risk.","authors":"Yu Zhang, Ming Li, David M Haas, C Noel Bairey Merz, Tsegaselassie Workalemahu, Kelli Ryckman, Janet M Catov, Lisa D Levine, Alexa Freedman, George R Saade, Jiaqi Hu, Hongyu Zhao, Xihao Li, Nianjun Liu, Qi Yan","doi":"10.1093/bib/bbaf649","DOIUrl":"10.1093/bib/bbaf649","url":null,"abstract":"<p><p>Mendelian randomization (MR) has become an important technique for establishing causal relationships between risk factors and health outcomes. By using genetic variants as instrumental variables, it can mitigate bias due to confounding and reverse causation in observational studies. Current MR analyses have predominantly used common genetic variants as instruments, which represent only part of the genetic architecture of complex traits. Rare variants, which can have larger effect sizes and provide unique biological insights, have been understudied due to statistical and methodological challenges. We introduce MR-common and annotation-informed rare variants (MR-CARV), a novel framework integrating common and rare genetic variants in two-sample MR. This method leverages comprehensive genetic data made available by high-throughput sequencing technologies and large-scale consortia. Rare variants are aggregated into functional categories, such as gene-coding, gene-noncoding, and nongene regions, by leveraging variant annotations and biological impact as weights. The effects of rare variant sets are then estimated with STAARpipeline and combined with the estimated effects of common variants by the existing MR methods. Simulation studies demonstrate that MR-CARV maintains robust type I error and achieves higher statistical power, with up to a 66.3% relative increase compared with existing methods only based on common variants. Consistent with these findings, application to real data on high-density lipoprotein cholesterol (HDL-C) and preeclampsia showed that MR-CARV [inverse variance weighted (IVW)] yielded a more precise and statistically significant effect estimate (-0.020, SE = 0.0102, $P$ =.0470) than IVW using only common variants (-0.023, SE = 0.0123, $P$ =.0659).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777983/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145917110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-omics data integration for enhanced cancer subtyping via interactive multi-kernel learning. 多组学数据集成,通过交互式多核学习增强癌症亚型。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-01 DOI: 10.1093/bib/bbaf687
Hongyan Cao, Tong Wang, Zhaoyang Xu, Xin Zhao, Gaiqin Liu, Xiaoling Yang, Ruiling Fang, Yanhong Luo, Ping Zeng, Hongmei Yu, Yanbo Zhang, Yuehua Cui

Cancer is a highly heterogeneous disease characterized by complex molecular changes. Subtypes identified through multi-omics data hold significant promise for improving prognosis and facilitating personalized precision treatment. Recent multi-omics integration methods have mostly focused on capturing complementary information from different data types, often overlooking potential interactions between omics data. Here we develop a novel method named interactive multi-kernel learning (iMKL), which incorporates omics-omics interactions alongside heterogeneous data types under the unsupervised multi-kernel learning framework, to improve subtype identification. Using the sample-similarity kernel for each dataset, we propose a joint Hadamard product strategy to capture higher-order interactive effects from different omics data types. We applied iMKL to two renal cell carcinoma (RCC) datasets-clear renal cell carcinoma (ccRCC) and type II papillary renal cell carcinoma (type II pRCC)-both including miRNA expression, mRNA expression, and DNA methylation data. Stability analysis through random sampling of patients or features demonstrated that iMKL exhibits strong robustness and accuracy in identifying patient subtypes. The identified subtypes revealed dramatic differences in patient survival, with both ccRCC and type II pRCC classified into three distinct subtypes. The findings in the real application highlight potential biomarkers associated with adverse patient outcomes and demonstrate substantial advancement in cancer subtype identification. The iMKL method effectively identifies tumor molecular subtypes that are strongly associated with clinical features and survival rates, providing valuable insights for accurate cancer subtyping, clinical decision-making, and the realization of personalized treatment strategies.

癌症是一种高度异质性的疾病,其特征是复杂的分子变化。通过多组学数据确定的亚型在改善预后和促进个性化精准治疗方面具有重要前景。最近的多组学集成方法主要侧重于从不同的数据类型中捕获互补信息,往往忽略了组学数据之间潜在的相互作用。本文提出了一种交互式多核学习(iMKL)方法,该方法在无监督多核学习框架下结合组学-组学相互作用和异构数据类型,以提高亚型识别。利用每个数据集的样本相似度核,我们提出了一个联合Hadamard产品策略,以捕获不同组学数据类型的高阶交互效应。我们将iMKL应用于两个肾细胞癌(RCC)数据集——透明肾细胞癌(ccRCC)和II型乳头状肾细胞癌(II型pRCC)——均包括miRNA表达、mRNA表达和DNA甲基化数据。通过随机抽样患者或特征的稳定性分析表明,iMKL在识别患者亚型方面具有很强的稳健性和准确性。所鉴定的亚型显示出患者生存率的显着差异,ccRCC和II型pRCC均分为三种不同的亚型。在实际应用中的发现突出了与不良患者结果相关的潜在生物标志物,并在癌症亚型识别方面取得了实质性进展。iMKL方法有效识别与临床特征和生存率密切相关的肿瘤分子亚型,为准确的癌症亚型分型、临床决策和实现个性化治疗策略提供有价值的见解。
{"title":"Multi-omics data integration for enhanced cancer subtyping via interactive multi-kernel learning.","authors":"Hongyan Cao, Tong Wang, Zhaoyang Xu, Xin Zhao, Gaiqin Liu, Xiaoling Yang, Ruiling Fang, Yanhong Luo, Ping Zeng, Hongmei Yu, Yanbo Zhang, Yuehua Cui","doi":"10.1093/bib/bbaf687","DOIUrl":"10.1093/bib/bbaf687","url":null,"abstract":"<p><p>Cancer is a highly heterogeneous disease characterized by complex molecular changes. Subtypes identified through multi-omics data hold significant promise for improving prognosis and facilitating personalized precision treatment. Recent multi-omics integration methods have mostly focused on capturing complementary information from different data types, often overlooking potential interactions between omics data. Here we develop a novel method named interactive multi-kernel learning (iMKL), which incorporates omics-omics interactions alongside heterogeneous data types under the unsupervised multi-kernel learning framework, to improve subtype identification. Using the sample-similarity kernel for each dataset, we propose a joint Hadamard product strategy to capture higher-order interactive effects from different omics data types. We applied iMKL to two renal cell carcinoma (RCC) datasets-clear renal cell carcinoma (ccRCC) and type II papillary renal cell carcinoma (type II pRCC)-both including miRNA expression, mRNA expression, and DNA methylation data. Stability analysis through random sampling of patients or features demonstrated that iMKL exhibits strong robustness and accuracy in identifying patient subtypes. The identified subtypes revealed dramatic differences in patient survival, with both ccRCC and type II pRCC classified into three distinct subtypes. The findings in the real application highlight potential biomarkers associated with adverse patient outcomes and demonstrate substantial advancement in cancer subtype identification. The iMKL method effectively identifies tumor molecular subtypes that are strongly associated with clinical features and survival rates, providing valuable insights for accurate cancer subtyping, clinical decision-making, and the realization of personalized treatment strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 6","pages":""},"PeriodicalIF":7.7,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12710476/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145773732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification of key candidate genes for ovarian cancer using integrated statistical and machine learning approaches. 使用综合统计和机器学习方法鉴定卵巢癌的关键候选基因。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-01 DOI: 10.1093/bib/bbaf602
Md Ali Hossain, Tania Akter Asa, Md Shofiqul Islam, Mohammad Zahidur Rahman, Mohammad Ali Moni

Ovarian cancer (OC) is a highly lethal malignancy worldwide, necessitating the identification of key genes to uncover its molecular mechanisms and improve diagnostic and therapeutic strategies. This study utilized statistical and machine learning approaches to identify key candidate genes for OC. Three microarray datasets were obtained from the gene expression omnibus database, and analysis began with normalization and differential gene expression analysis using the Limma package. Highly discriminative differentially expressed genes (HDDEGs) were identified through a support vector machine-based approach, yielding 84 overlapping HDDEGs across the datasets. Enrichment analysis of HDDEGs was conducted using DAVID. A protein-protein interaction network constructed via STRING pinpointed central hub genes using CytoHubba metrics. Significant modules were analyzed with molecular complex detection, identifying 18 central hub genes, 11 hub module genes, and 54 meta-hub genes. The intersection of these three gene sets revealed eight shared key genes (FANCD2, BUB1B, BUB1, KIF4A, DTL, NCAPG, KIF20A, and UBE2C). Weighted gene co-expression network analysis identified key modules linked to clinical traits and confirmed grouping eight key candidate genes into a single cluster. These genes were validated using two independent datasets (GSE38666 and TCGA-OC), with area under the curve and survival analyses underscoring their predictive and prognostic significance in OC. This integrative approach advances understanding of OC's molecular basis, identifies potential biomarkers, and emphasizes the clinical relevance of the eight key candidate genes for OC diagnosis, prognosis, and treatment.

卵巢癌(Ovarian cancer, OC)是一种全球范围内的高致死率恶性肿瘤,需要通过关键基因的鉴定来揭示其分子机制,提高诊断和治疗策略。本研究利用统计学和机器学习方法来确定OC的关键候选基因。从基因表达综合数据库中获得三个微阵列数据集,并使用Limma软件包进行归一化和差异基因表达分析。通过基于支持向量机的方法鉴定出高度判别性差异表达基因(hddeg),在数据集中产生84个重叠的hddeg。采用DAVID对hddeg进行富集分析。通过STRING构建的蛋白相互作用网络利用CytoHubba指标确定了中心枢纽基因。通过分子复合体检测,鉴定出18个中心枢纽基因、11个枢纽模块基因和54个元枢纽基因。这三个基因集的交集揭示了8个共享的关键基因(FANCD2、BUB1B、BUB1、KIF4A、DTL、NCAPG、KIF20A和UBE2C)。加权基因共表达网络分析确定了与临床特征相关的关键模块,并确认将8个关键候选基因分组为一个簇。这些基因使用两个独立的数据集(GSE38666和TCGA-OC)进行验证,曲线下面积和生存分析强调了它们在OC中的预测和预后意义。这种综合方法促进了对卵巢癌分子基础的理解,确定了潜在的生物标志物,并强调了8个关键候选基因与卵巢癌诊断、预后和治疗的临床相关性。
{"title":"Identification of key candidate genes for ovarian cancer using integrated statistical and machine learning approaches.","authors":"Md Ali Hossain, Tania Akter Asa, Md Shofiqul Islam, Mohammad Zahidur Rahman, Mohammad Ali Moni","doi":"10.1093/bib/bbaf602","DOIUrl":"10.1093/bib/bbaf602","url":null,"abstract":"<p><p>Ovarian cancer (OC) is a highly lethal malignancy worldwide, necessitating the identification of key genes to uncover its molecular mechanisms and improve diagnostic and therapeutic strategies. This study utilized statistical and machine learning approaches to identify key candidate genes for OC. Three microarray datasets were obtained from the gene expression omnibus database, and analysis began with normalization and differential gene expression analysis using the Limma package. Highly discriminative differentially expressed genes (HDDEGs) were identified through a support vector machine-based approach, yielding 84 overlapping HDDEGs across the datasets. Enrichment analysis of HDDEGs was conducted using DAVID. A protein-protein interaction network constructed via STRING pinpointed central hub genes using CytoHubba metrics. Significant modules were analyzed with molecular complex detection, identifying 18 central hub genes, 11 hub module genes, and 54 meta-hub genes. The intersection of these three gene sets revealed eight shared key genes (FANCD2, BUB1B, BUB1, KIF4A, DTL, NCAPG, KIF20A, and UBE2C). Weighted gene co-expression network analysis identified key modules linked to clinical traits and confirmed grouping eight key candidate genes into a single cluster. These genes were validated using two independent datasets (GSE38666 and TCGA-OC), with area under the curve and survival analyses underscoring their predictive and prognostic significance in OC. This integrative approach advances understanding of OC's molecular basis, identifies potential biomarkers, and emphasizes the clinical relevance of the eight key candidate genes for OC diagnosis, prognosis, and treatment.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 6","pages":""},"PeriodicalIF":7.7,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12710472/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145773744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Systematic evaluation of predictors for binding free energy changes upon mutations in protein complexes. 蛋白质复合物突变时结合自由能变化预测因子的系统评价。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-01 DOI: 10.1093/bib/bbaf645
Yu Zhang, Yunjiong Liu, Yulin Zhang, Ziyang Wang, Xiaoli Lu, Shengxiang Ge, Xiaoping Min

The prediction of binding free energy changes ($Delta Delta G$) caused by mutations in protein complexes is crucial for understanding disease mechanisms and designing antibodies. Approximately 60% of pathogenic missense mutations lead to functional abnormalities by disrupting molecular interactions. However, although existing $Delta Delta G$ predictors exhibit strong performance in benchmarks, they suffer from inadequate generalization, a misalignment between evaluation metrics and practical needs, and poor adaptability to complex mutation scenarios. This study systematically assessed eight mainstream predictors, covering both physical energy function-based and machine learning-based methods, and constructed an independent evaluation set. This study employed multi-dimensional metrics, including regression accuracy and classification capability, while also analyzing the performance variations of predictors across different mutation types, stability categories, and microenvironments of protein mutation sites. The results indicate that >60% of predictors (5 out of 8) predictors exhibit a systematic bias toward overestimating mutational instability. In the three-class classification task, predictors demonstrate a limited ability to identify stabilizing mutations ($Delta Delta G< -0.5$ kcal/mol), with recall rates <0.1 for this class, and overall predictive efficacy depends on the protein local structure. In summary, this study reveals the limitations of current $Delta Delta G$ predictors in terms of generalization and adaptability to complex scenarios, thus providing a reference for the optimization and practical application of $Delta Delta G$ prediction methods. It suggests that future breakthroughs can be achieved by constructing balanced and standardized datasets alongside developing local-global fusion algorithms.

预测蛋白质复合物突变引起的结合自由能变化($Delta Delta G$)对于理解疾病机制和设计抗体至关重要。大约60%的致病性错义突变通过破坏分子相互作用导致功能异常。然而,尽管现有的$Delta Delta G$预测器在基准测试中表现出强大的性能,但它们存在泛化不足、评估指标与实际需求之间的不一致以及对复杂突变场景的适应性差的问题。本研究系统评估了8种主流预测方法,包括基于物理能量函数的方法和基于机器学习的方法,并构建了一个独立的评估集。本研究采用多维指标,包括回归精度和分类能力,同时还分析了预测因子在不同突变类型、稳定性类别和蛋白质突变位点微环境中的性能变化。结果表明,60%的预测因子(8个预测因子中有5个)表现出高估突变不稳定性的系统性偏差。在三类分类任务中,预测器在识别稳定突变($Delta Delta G< -0.5$ kcal/mol)的召回率方面表现出有限的能力
{"title":"Systematic evaluation of predictors for binding free energy changes upon mutations in protein complexes.","authors":"Yu Zhang, Yunjiong Liu, Yulin Zhang, Ziyang Wang, Xiaoli Lu, Shengxiang Ge, Xiaoping Min","doi":"10.1093/bib/bbaf645","DOIUrl":"10.1093/bib/bbaf645","url":null,"abstract":"<p><p>The prediction of binding free energy changes ($Delta Delta G$) caused by mutations in protein complexes is crucial for understanding disease mechanisms and designing antibodies. Approximately 60% of pathogenic missense mutations lead to functional abnormalities by disrupting molecular interactions. However, although existing $Delta Delta G$ predictors exhibit strong performance in benchmarks, they suffer from inadequate generalization, a misalignment between evaluation metrics and practical needs, and poor adaptability to complex mutation scenarios. This study systematically assessed eight mainstream predictors, covering both physical energy function-based and machine learning-based methods, and constructed an independent evaluation set. This study employed multi-dimensional metrics, including regression accuracy and classification capability, while also analyzing the performance variations of predictors across different mutation types, stability categories, and microenvironments of protein mutation sites. The results indicate that >60% of predictors (5 out of 8) predictors exhibit a systematic bias toward overestimating mutational instability. In the three-class classification task, predictors demonstrate a limited ability to identify stabilizing mutations ($Delta Delta G< -0.5$ kcal/mol), with recall rates <0.1 for this class, and overall predictive efficacy depends on the protein local structure. In summary, this study reveals the limitations of current $Delta Delta G$ predictors in terms of generalization and adaptability to complex scenarios, thus providing a reference for the optimization and practical application of $Delta Delta G$ prediction methods. It suggests that future breakthroughs can be achieved by constructing balanced and standardized datasets alongside developing local-global fusion algorithms.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 6","pages":""},"PeriodicalIF":7.7,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12684732/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProtoMol: enhancing molecular property prediction via prototype-guided multimodal learning. ProtoMol:通过原型引导的多模态学习增强分子特性预测。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-01 DOI: 10.1093/bib/bbaf629
Yingxu Wang, Kunyu Zhang, Jiaxin Huang, Nan Yin, Siwei Liu, Eran Segal

Multimodal molecular representation learning, which jointly models molecular graphs and their textual descriptions, enhances predictive accuracy and interpretability by enabling more robust and reliable predictions of drug toxicity, bioactivity, and physicochemical properties through the integration of structural and semantic information. However, existing multimodal methods suffer from two key limitations: (i) they typically perform cross-modal interaction only at the final encoder layer, thus overlooking hierarchical semantic dependencies; (ii) they lack a unified prototype space for robust alignment between modalities. To address these limitations, we propose ProtoMol, a prototype-guided multimodal framework that enables fine-grained integration and consistent semantic alignment between molecular graphs and textual descriptions. ProtoMol incorporates dual-branch hierarchical encoders, utilizing Graph Neural Networks to process structured molecular graphs and Transformers to encode unstructured texts, resulting in comprehensive layer-wise representations. Then, ProtoMol introduces a layer-wise bidirectional cross-modal attention mechanism that progressively aligns semantic features across layers. Furthermore, a shared prototype space with learnable, class-specific anchors is constructed to guide both modalities toward coherent and discriminative representations. Extensive experiments on multiple benchmark datasets demonstrate that ProtoMol consistently outperforms state-of-the-art baselines across a variety of molecular property prediction tasks. Our source code is available at: https://github.com/zky04/Protomol.

多模态分子表示学习(Multimodal molecular representation learning)联合建模分子图及其文本描述,通过整合结构和语义信息,实现对药物毒性、生物活性和物理化学性质更稳健、更可靠的预测,从而提高预测的准确性和可解释性。然而,现有的多模态方法有两个关键的局限性:(i)它们通常只在最后的编码器层执行跨模态交互,从而忽略了分层语义依赖;(ii)它们缺乏统一的原型空间来实现模式之间的稳健对齐。为了解决这些限制,我们提出了ProtoMol,这是一个原型引导的多模态框架,可以实现分子图和文本描述之间的细粒度集成和一致的语义对齐。ProtoMol采用双分支分层编码器,利用图神经网络处理结构化分子图,利用变形器编码非结构化文本,从而实现全面的分层表示。然后,ProtoMol引入了一种分层双向跨模态注意机制,逐步跨层对齐语义特征。此外,我们构建了一个共享的原型空间,其中包含可学习的、特定类别的锚点,以指导两种模式走向连贯和区分表征。在多个基准数据集上进行的大量实验表明,ProtoMol在各种分子性质预测任务中始终优于最先进的基线。我们的源代码可从https://github.com/zky04/Protomol获得。
{"title":"ProtoMol: enhancing molecular property prediction via prototype-guided multimodal learning.","authors":"Yingxu Wang, Kunyu Zhang, Jiaxin Huang, Nan Yin, Siwei Liu, Eran Segal","doi":"10.1093/bib/bbaf629","DOIUrl":"10.1093/bib/bbaf629","url":null,"abstract":"<p><p>Multimodal molecular representation learning, which jointly models molecular graphs and their textual descriptions, enhances predictive accuracy and interpretability by enabling more robust and reliable predictions of drug toxicity, bioactivity, and physicochemical properties through the integration of structural and semantic information. However, existing multimodal methods suffer from two key limitations: (i) they typically perform cross-modal interaction only at the final encoder layer, thus overlooking hierarchical semantic dependencies; (ii) they lack a unified prototype space for robust alignment between modalities. To address these limitations, we propose ProtoMol, a prototype-guided multimodal framework that enables fine-grained integration and consistent semantic alignment between molecular graphs and textual descriptions. ProtoMol incorporates dual-branch hierarchical encoders, utilizing Graph Neural Networks to process structured molecular graphs and Transformers to encode unstructured texts, resulting in comprehensive layer-wise representations. Then, ProtoMol introduces a layer-wise bidirectional cross-modal attention mechanism that progressively aligns semantic features across layers. Furthermore, a shared prototype space with learnable, class-specific anchors is constructed to guide both modalities toward coherent and discriminative representations. Extensive experiments on multiple benchmark datasets demonstrate that ProtoMol consistently outperforms state-of-the-art baselines across a variety of molecular property prediction tasks. Our source code is available at: https://github.com/zky04/Protomol.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 6","pages":""},"PeriodicalIF":7.7,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12684735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145707468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bioinformatics frameworks for single-cell long-read sequencing: unlocking isoform-level resolution. 单细胞长读测序的生物信息学框架:解锁同种异构体水平的分辨率。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-11-01 DOI: 10.1093/bib/bbaf655
Saloni Bhatia, Matt A Field, Lionel Hebbard, Ulf Schmitz

Alternative splicing (AS) plays a key role in regulating gene expression, and its dysregulation is implicated in numerous human diseases, including cancer. While bulk RNA sequencing has advanced our understanding of AS, it cannot capture cellular heterogeneity or reliably reconstruct full-length isoforms, both of which underpin disease mechanisms and therapeutic responses. Single-cell RNA sequencing (scRNA-seq) is an established and a powerful approach to examine AS landscapes at single-cell resolution, enabling the identification of cell-specific aberrant splicing events that may contribute to disease. However, conventional scRNA-seq is limited by short read lengths, often preventing an accurate reconstruction of full-length transcript isoforms. This limitation is addressed by long-read RNA-seq (lrRNA-seq), which can sequence full-length RNA molecules, some exceeding 100 000 nucleotides in length. Thereby, lrRNA-seq enables more accurate characterization of isoform diversity, identification of novel splice variants, quantification of percent spliced-in values, and detection of fusion transcripts. The convergence of single-cell resolution and third-generation sequencing technologies has led to the development of single-cell long-read sequencing (SCLR-seq), a powerful approach that addresses the key constraints of bulk short-read RNA-Seq by providing isoform-level resolution and cell-type specificity. This review explores the growing utility of SCLR-seq, highlighting recent developments in bioinformatics tools and pipelines designed for SCLR-seq data analysis. We discuss how this emerging technology is transforming our understanding of isoform regulation and aberrant splicing in human diseases, and its potential to uncover novel diagnostic and therapeutic targets.

选择性剪接(AS)在调节基因表达中起着关键作用,其失调与包括癌症在内的许多人类疾病有关。虽然大量RNA测序提高了我们对AS的理解,但它不能捕获细胞异质性或可靠地重建全长同种异构体,这两者都是疾病机制和治疗反应的基础。单细胞RNA测序(scRNA-seq)是一种在单细胞分辨率下检测AS景观的成熟且强大的方法,能够识别可能导致疾病的细胞特异性异常剪接事件。然而,传统的scRNA-seq受到短读取长度的限制,通常无法准确重建全长转录本同种型。长读RNA-seq (lrRNA-seq)解决了这一限制,它可以对全长RNA分子进行测序,有些长度超过100,000个核苷酸。因此,lrRNA-seq能够更准确地表征异构体多样性,鉴定新的剪接变异,定量剪接值的百分比,以及检测融合转录物。单细胞分辨率和第三代测序技术的融合导致了单细胞长读测序(SCLR-seq)的发展,这是一种强大的方法,通过提供同种异构体水平的分辨率和细胞类型特异性,解决了大量短读RNA-Seq的关键限制。这篇综述探讨了SCLR-seq日益增长的应用,重点介绍了用于SCLR-seq数据分析的生物信息学工具和管道的最新发展。我们讨论了这项新兴技术如何改变我们对人类疾病中异构体调节和异常剪接的理解,以及它在发现新的诊断和治疗靶点方面的潜力。
{"title":"Bioinformatics frameworks for single-cell long-read sequencing: unlocking isoform-level resolution.","authors":"Saloni Bhatia, Matt A Field, Lionel Hebbard, Ulf Schmitz","doi":"10.1093/bib/bbaf655","DOIUrl":"10.1093/bib/bbaf655","url":null,"abstract":"<p><p>Alternative splicing (AS) plays a key role in regulating gene expression, and its dysregulation is implicated in numerous human diseases, including cancer. While bulk RNA sequencing has advanced our understanding of AS, it cannot capture cellular heterogeneity or reliably reconstruct full-length isoforms, both of which underpin disease mechanisms and therapeutic responses. Single-cell RNA sequencing (scRNA-seq) is an established and a powerful approach to examine AS landscapes at single-cell resolution, enabling the identification of cell-specific aberrant splicing events that may contribute to disease. However, conventional scRNA-seq is limited by short read lengths, often preventing an accurate reconstruction of full-length transcript isoforms. This limitation is addressed by long-read RNA-seq (lrRNA-seq), which can sequence full-length RNA molecules, some exceeding 100 000 nucleotides in length. Thereby, lrRNA-seq enables more accurate characterization of isoform diversity, identification of novel splice variants, quantification of percent spliced-in values, and detection of fusion transcripts. The convergence of single-cell resolution and third-generation sequencing technologies has led to the development of single-cell long-read sequencing (SCLR-seq), a powerful approach that addresses the key constraints of bulk short-read RNA-Seq by providing isoform-level resolution and cell-type specificity. This review explores the growing utility of SCLR-seq, highlighting recent developments in bioinformatics tools and pipelines designed for SCLR-seq data analysis. We discuss how this emerging technology is transforming our understanding of isoform regulation and aberrant splicing in human diseases, and its potential to uncover novel diagnostic and therapeutic targets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 6","pages":""},"PeriodicalIF":7.7,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696714/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145720917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1