Briefings in bioinformatics最新文献

Finding Significant Hits in Networks: a network-based tool for analyzing gene-level P-values to identify significant genes missed by standard methods. 在网络中发现重大命中：一个基于网络的工具，用于分析基因水平p值，以识别标准方法遗漏的重要基因。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag061

Sandeep Acharya, Vaha Akbary Moghaddam, Wooseok J Jung, Yu S Kang, Shu Liao, Michael A Province, Michael R Brent

Finding Significant Hits in Networks (FISHNET) uses prior biological knowledge, represented as gene interaction networks and gene function annotations, to identify genes that do not meet the genome-wide significance threshold but replicate, nonetheless. Its input is gene-level P-values from any source, including omicsWAS, aggregation of genome-wide association studies P-values, CRISPR screens, or differential expression analysis. It is based on the idea that genes whose P-values are low purely by chance are distributed randomly across networks and functions, so genes with suggestive P-values that cluster in densely connected subnetworks and share common functions are less likely to reflect chance and more likely to replicate. FISHNET combines network and function analysis with permutation-based P-value thresholds to identify a small set of exceptional genes that we call FISHNET genes. Applied to 11 cardiovascular risk traits, FISHNET identified 19 gene-trait relationships that missed genome-wide significance thresholds but, nonetheless, replicated in an independent cohort. The replication rate of FISHNET genes matched that of genes with lower P-values. FISHNET identified a novel association between RUNX1 expression and HDL that is supported by experimental evidence that RUNX1 promotes white fat browning, which increases HDL cholesterol levels. FISHNET also identified an association between LTB expression and BMI that is supported by experimental evidence that higher LTB expression increases BMI via activation of the LTβR pathway. Both associations failed genome-wide significance thresholds, highlighting FISHNET's ability to uncover meaningful relationships missed by traditional methods. FISHNET software is freely available at https://brentlab.github.io/fishnet/.

在网络中发现重大命中（FISHNET）使用先前的生物学知识，表示为基因相互作用网络和基因功能注释，以识别不符合全基因组显著性阈值但仍在复制的基因。它的输入是来自任何来源的基因水平的p值，包括组学was、全基因组关联研究的p值聚合、CRISPR筛选或差异表达分析。它基于这样一种观点，即p值低纯粹是偶然的基因在网络和功能中随机分布，因此具有暗示性p值的基因聚集在紧密连接的子网络中并共享共同功能，这种基因不太可能反映偶然，而更有可能复制。FISHNET将网络和功能分析与基于排列的p值阈值相结合，以确定一小组特殊基因，我们称之为FISHNET基因。FISHNET应用于11个心血管风险特征，确定了19个基因-特征关系，这些关系没有达到全基因组显著阈值，但在一个独立的队列中得到了重复。FISHNET基因的复制率与低p值基因的复制率一致。FISHNET发现了RUNX1表达与HDL之间的一种新的关联，实验证据表明RUNX1促进白色脂肪褐变，从而增加HDL胆固醇水平。FISHNET还发现了LTB表达与BMI之间的关联，实验证据表明，LTB的高表达通过激活LTβR途径增加了BMI。这两种关联都没有达到全基因组显著性阈值，这突出了FISHNET发现传统方法遗漏的有意义关系的能力。FISHNET软件可在https://brentlab.github.io/fishnet/免费获得。

{"title":"Finding Significant Hits in Networks: a network-based tool for analyzing gene-level P-values to identify significant genes missed by standard methods.","authors":"Sandeep Acharya, Vaha Akbary Moghaddam, Wooseok J Jung, Yu S Kang, Shu Liao, Michael A Province, Michael R Brent","doi":"10.1093/bib/bbag061","DOIUrl":"10.1093/bib/bbag061","url":null,"abstract":"Finding Significant Hits in Networks (FISHNET) uses prior biological knowledge, represented as gene interaction networks and gene function annotations, to identify genes that do not meet the genome-wide significance threshold but replicate, nonetheless. Its input is gene-level P-values from any source, including omicsWAS, aggregation of genome-wide association studies P-values, CRISPR screens, or differential expression analysis. It is based on the idea that genes whose P-values are low purely by chance are distributed randomly across networks and functions, so genes with suggestive P-values that cluster in densely connected subnetworks and share common functions are less likely to reflect chance and more likely to replicate. FISHNET combines network and function analysis with permutation-based P-value thresholds to identify a small set of exceptional genes that we call FISHNET genes. Applied to 11 cardiovascular risk traits, FISHNET identified 19 gene-trait relationships that missed genome-wide significance thresholds but, nonetheless, replicated in an independent cohort. The replication rate of FISHNET genes matched that of genes with lower P-values. FISHNET identified a novel association between RUNX1 expression and HDL that is supported by experimental evidence that RUNX1 promotes white fat browning, which increases HDL cholesterol levels. FISHNET also identified an association between LTB expression and BMI that is supported by experimental evidence that higher LTB expression increases BMI via activation of the LTβR pathway. Both associations failed genome-wide significance thresholds, highlighting FISHNET's ability to uncover meaningful relationships missed by traditional methods. FISHNET software is freely available at https://brentlab.github.io/fishnet/.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12967332/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147375833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Master of Metals2: a graph neural network based architecture for the prediction of zinc binding sites in protein structures. masters of Metals2：一个基于图神经网络的结构，用于预测蛋白质结构中的锌结合位点。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag078

Vincenzo Laveglia, Cosimo Ciofalo, Enrico Morelli, Claudia Andreini, Antonio Rosato

Zinc ions play essential structural and catalytic roles in a wide range of proteins. Accurate prediction of their binding sites is crucial for structural and functional annotation. We present MoM2, a web-accessible tool for predicting zinc-binding sites in protein 3D structures. MoM2 employs a graph neural network trained exclusively on spatial features specifically, Cα and Cβ coordinates eliminating the need for templates or sequence-based heuristics. The tool efficiently processes entire proteomes within hours and demonstrates strong predictive performance. In a benchmark of 412 experimentally determined apo-structures, MoM2 outperformed existing methods, achieving the highest F1-score (55.7%) and the lowest false discovery rate (44.1%). The web interface supports input via structure files, PDB or UniProt IDs, and allows batch processing with customizable thresholds. As an independent validation, MoM2 correctly identified 18 out of 20 predicted zinc sites in SARS-CoV-2 proteins. The tool is freely available at https://mom2.cerm.unifi.it.

锌离子在多种蛋白质中起着重要的结构和催化作用。准确预测它们的结合位点对结构和功能注释至关重要。我们提出MoM2，一个可访问的网络工具，用于预测蛋白质3D结构中的锌结合位点。MoM2采用了一个专门训练空间特征的图神经网络，特别是Cα和Cβ坐标，消除了对模板或基于序列的启发式的需要。该工具在数小时内有效地处理整个蛋白质组，并显示出强大的预测性能。在412个实验确定的载子结构的基准中，MoM2优于现有的方法，获得了最高的f1得分（55.7%）和最低的错误发现率（44.1%）。web界面支持通过结构文件，PDB或UniProt id输入，并允许批量处理可定制的阈值。作为一项独立验证，MoM2正确识别了SARS-CoV-2蛋白中20个预测锌位点中的18个。该工具可在https://mom2.cerm.unifi.it免费获得。

{"title":"Master of Metals2: a graph neural network based architecture for the prediction of zinc binding sites in protein structures.","authors":"Vincenzo Laveglia, Cosimo Ciofalo, Enrico Morelli, Claudia Andreini, Antonio Rosato","doi":"10.1093/bib/bbag078","DOIUrl":"10.1093/bib/bbag078","url":null,"abstract":"Zinc ions play essential structural and catalytic roles in a wide range of proteins. Accurate prediction of their binding sites is crucial for structural and functional annotation. We present MoM2, a web-accessible tool for predicting zinc-binding sites in protein 3D structures. MoM2 employs a graph neural network trained exclusively on spatial features specifically, Cα and Cβ coordinates eliminating the need for templates or sequence-based heuristics. The tool efficiently processes entire proteomes within hours and demonstrates strong predictive performance. In a benchmark of 412 experimentally determined apo-structures, MoM2 outperformed existing methods, achieving the highest F1-score (55.7%) and the lowest false discovery rate (44.1%). The web interface supports input via structure files, PDB or UniProt IDs, and allows batch processing with customizable thresholds. As an independent validation, MoM2 correctly identified 18 out of 20 predicted zinc sites in SARS-CoV-2 proteins. The tool is freely available at https://mom2.cerm.unifi.it.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951075/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147324763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI-driven computational methods and benchmarking for T-cell antigen identification. 人工智能驱动的t细胞抗原鉴定计算方法和基准。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag123

Yang Deng, Jinhao Que, Guangfu Xue, Yideng Cai, Wenyi Yang, Yilin Wang, Yi Hui, Zuxiang Wang, Yi Lin, Wenyang Zhou, Zhaochun Xu, Qinghua Jiang, Haoxiu Sun

The rise of mRNA vaccines highlights the pivotal role of T-cell antigen identification in modern vaccinology and personalized medicine. T-cell recognition relies on the sophisticated ternary interaction between the T-cell receptor (TCR), the major histocompatibility complex (MHC) molecule, and the peptide antigen, which forms the peptide-MHC (pMHC) complex. Computational methods, particularly artificial intelligence (AI), are indispensable for accurately predicting these complex bindings. This review systematically surveys the rapidly evolving AI-driven landscape for T-cell antigen identification, providing a comprehensive categorization of methods for MHC-I, MHC-II, and the highly complex TCR-pMHC binding prediction, alongside foundational data resources. Crucially, we conduct a rigorous, standardized benchmarking of 18 state-of-the-art TCR-pMHC prediction models across diverse training data sources. Our evaluation on two distinct and challenging out-of-distribution (OOD) unseen epitope variant datasets reveals a significant and concerning generalization gap in current predictors. Notably, the overall absolute predictive gain remains marginal across all models under OOD conditions. This result underscores a severe and persistent generalization challenge when faced with novel epitope variants. To address these limitations, we emphasize the urgent need for enhanced structural modeling, the integration of multi-omics data, and the development of generative models for de novo TCR design. By advancing these computational frontiers, our community can accelerate the transition from prediction to rational design in immunoinformatics.

mRNA疫苗的兴起凸显了t细胞抗原鉴定在现代疫苗学和个性化医疗中的关键作用。t细胞识别依赖于t细胞受体（TCR）、主要组织相容性复合体（MHC）分子和肽抗原之间复杂的三元相互作用，后者形成肽-MHC （pMHC）复合体。计算方法，特别是人工智能（AI），对于准确预测这些复杂的绑定是不可或缺的。本综述系统地调查了快速发展的人工智能驱动的t细胞抗原鉴定领域，提供了MHC-I、MHC-II和高度复杂的TCR-pMHC结合预测方法的综合分类，以及基础数据资源。至关重要的是，我们在不同的训练数据源中对18个最先进的TCR-pMHC预测模型进行了严格的标准化基准测试。我们对两个不同且具有挑战性的分布外（OOD）未见表位变异数据集的评估揭示了当前预测方法的显著和令人担忧的泛化差距。值得注意的是，在OOD条件下，所有模型的总体绝对预测增益仍然是边际的。这一结果强调了在面对新的表位变异时，一个严峻而持久的泛化挑战。为了解决这些限制，我们强调迫切需要增强结构建模，多组学数据的集成，以及为从头设计TCR开发生成模型。通过推进这些计算前沿，我们的社区可以加速免疫信息学从预测到合理设计的过渡。

{"title":"AI-driven computational methods and benchmarking for T-cell antigen identification.","authors":"Yang Deng, Jinhao Que, Guangfu Xue, Yideng Cai, Wenyi Yang, Yilin Wang, Yi Hui, Zuxiang Wang, Yi Lin, Wenyang Zhou, Zhaochun Xu, Qinghua Jiang, Haoxiu Sun","doi":"10.1093/bib/bbag123","DOIUrl":"10.1093/bib/bbag123","url":null,"abstract":"The rise of mRNA vaccines highlights the pivotal role of T-cell antigen identification in modern vaccinology and personalized medicine. T-cell recognition relies on the sophisticated ternary interaction between the T-cell receptor (TCR), the major histocompatibility complex (MHC) molecule, and the peptide antigen, which forms the peptide-MHC (pMHC) complex. Computational methods, particularly artificial intelligence (AI), are indispensable for accurately predicting these complex bindings. This review systematically surveys the rapidly evolving AI-driven landscape for T-cell antigen identification, providing a comprehensive categorization of methods for MHC-I, MHC-II, and the highly complex TCR-pMHC binding prediction, alongside foundational data resources. Crucially, we conduct a rigorous, standardized benchmarking of 18 state-of-the-art TCR-pMHC prediction models across diverse training data sources. Our evaluation on two distinct and challenging out-of-distribution (OOD) unseen epitope variant datasets reveals a significant and concerning generalization gap in current predictors. Notably, the overall absolute predictive gain remains marginal across all models under OOD conditions. This result underscores a severe and persistent generalization challenge when faced with novel epitope variants. To address these limitations, we emphasize the urgent need for enhanced structural modeling, the integration of multi-omics data, and the development of generative models for de novo TCR design. By advancing these computational frontiers, our community can accelerate the transition from prediction to rational design in immunoinformatics.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12993716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147472678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Decoding TCR recognition via geometric deep learning of immunological fingerprints. 免疫指纹几何深度学习解码TCR识别。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag048

Chun Shang, Kevin C Chan, Ruhong Zhou

T cell receptor (TCR) recognition of peptide-major histocompatibility complex (pMHC) molecules is the critical first step in adaptive immune activation, shaping immunity against pathogens and tumors, as well as tolerance to self. Despite extensive structural characterization of TCR-pMHC complexes, the molecular principles underlying this process remain incompletely understood, hindered by the inherent duality of TCR specificity and cross-reactivity. Traditional structural analyses often fall short in capturing the multidimensional features that govern TCR-pMHC engagement. Here, we introduce a multimodal geometric deep learning framework that systematically extracts and learns various physicochemical and spatial features from pMHC interfaces, which encode key immunological cues for TCR recognition. Applied to a curated dataset of human leukocyte antigens HLA-A*02-peptide-TCR crystal structures, our model robustly predicts TCR binding preferences and uncovers interfacial "immunological fingerprints" that inform receptor engagement. Through an integrated explainability module, we identify critical contact residues and interaction motifs, thus providing interpretable insights into the determinants of TCR specificity. We further demonstrate the model's generalizability by analyzing HLA-B*27-peptide complexes, revealing potential TCR cross-reactivity between self-derived and bacterial peptides-highlighting its utility in probing molecular mimicry. This work establishes a scalable, structure-based approach for decoding T cell recognition and offers a powerful tool for guiding antigen design, vaccine development, and TCR-based immunotherapies.

T细胞受体（TCR）对肽-主要组织相容性复合体（pMHC）分子的识别是适应性免疫激活、形成针对病原体和肿瘤的免疫以及对自身耐受的关键的第一步。尽管对TCR- pmhc复合物进行了广泛的结构表征，但由于TCR特异性和交叉反应性的内在二元性，这一过程的分子原理仍然不完全清楚。传统的结构分析在捕捉控制TCR-pMHC合作的多维特征方面往往不足。在这里，我们引入了一个多模态几何深度学习框架，该框架系统地从pMHC界面提取和学习各种物理化学和空间特征，这些特征编码了TCR识别的关键免疫线索。应用于人类白细胞抗原HLA-A*02-peptide-TCR晶体结构的数据集，我们的模型可靠地预测了TCR结合偏好，并揭示了告知受体结合的界面“免疫指纹”。通过集成的可解释性模块，我们确定了关键的接触残留物和相互作用基序，从而为TCR特异性的决定因素提供了可解释的见解。我们通过分析HLA-B*27肽复合物进一步证明了该模型的普遍性，揭示了自源肽和细菌肽之间潜在的TCR交叉反应性，突出了其在探测分子模拟中的实用性。这项工作建立了一种可扩展的、基于结构的T细胞识别解码方法，并为指导抗原设计、疫苗开发和基于tcr的免疫疗法提供了强大的工具。

{"title":"Decoding TCR recognition via geometric deep learning of immunological fingerprints.","authors":"Chun Shang, Kevin C Chan, Ruhong Zhou","doi":"10.1093/bib/bbag048","DOIUrl":"10.1093/bib/bbag048","url":null,"abstract":"T cell receptor (TCR) recognition of peptide-major histocompatibility complex (pMHC) molecules is the critical first step in adaptive immune activation, shaping immunity against pathogens and tumors, as well as tolerance to self. Despite extensive structural characterization of TCR-pMHC complexes, the molecular principles underlying this process remain incompletely understood, hindered by the inherent duality of TCR specificity and cross-reactivity. Traditional structural analyses often fall short in capturing the multidimensional features that govern TCR-pMHC engagement. Here, we introduce a multimodal geometric deep learning framework that systematically extracts and learns various physicochemical and spatial features from pMHC interfaces, which encode key immunological cues for TCR recognition. Applied to a curated dataset of human leukocyte antigens HLA-A*02-peptide-TCR crystal structures, our model robustly predicts TCR binding preferences and uncovers interfacial \"immunological fingerprints\" that inform receptor engagement. Through an integrated explainability module, we identify critical contact residues and interaction motifs, thus providing interpretable insights into the determinants of TCR specificity. We further demonstrate the model's generalizability by analyzing HLA-B*27-peptide complexes, revealing potential TCR cross-reactivity between self-derived and bacterial peptides-highlighting its utility in probing molecular mimicry. This work establishes a scalable, structure-based approach for decoding T cell recognition and offers a powerful tool for guiding antigen design, vaccine development, and TCR-based immunotherapies.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12989321/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147462628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Co-expression network multivariate regression. 共表达网络多元回归。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag125

Hwiyoung Lee, Yezhi Pan, Shuo Chen

Accounting for dependence among high-dimensional variables in omics data analysis is critical to obtain accurate and reliable statistical inference. Although latent, omics variables often exhibit structured correlation/co-expression patterns. However, there are few methods explicitly accounting for such structured dependence in the statistical analysis of omics data (e.g. differential expression analysis). To address this methodological gap, we propose a Coexpression network multivariate Regression (CoReg), which integrates co-expression network structure into multivariate regression analysis to precisely account for the inter-correlations (dependence) among omics variables. We show in simulations that CoReg substantially improves the accuracy of statistical inference and replicability across studies. These findings suggest that CoReg provides an alternative approach for omics data association analysis with dependence adjustment, analogous to the role of mixed-effects models in handling repeated measures in lower-dimensional settings.

在组学数据分析中，考虑高维变量之间的相关性是获得准确可靠的统计推断的关键。虽然是潜在的，组学变量经常表现出结构化的相关/共表达模式。然而，在组学数据的统计分析（如差异表达分析）中，很少有方法明确地考虑这种结构依赖性。为了解决这一方法上的差距，我们提出了一种共表达网络多元回归（CoReg），它将共表达网络结构整合到多元回归分析中，以精确地解释组学变量之间的相互关系（依赖性）。我们在模拟中表明，CoReg大大提高了统计推断的准确性和跨研究的可复制性。这些发现表明，CoReg为组学数据关联分析提供了一种具有依赖性调整的替代方法，类似于混合效应模型在处理低维环境中的重复测量中的作用。

引用次数: 0

Advances in predicting omics profiles from imaging data. 从成像数据预测组学特征的进展。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag090

Alexa H Beachum, Xue Xiao, Yuansheng Zhou, Qiwei Li, Guanghua Xiao, Lin Xu

While traditional imaging techniques, such as histopathology, are often part of clinical workflows, molecular profiling remains more difficult to conduct and is less cost-effective. Thus, the prediction of molecular 'omics' data directly from imaging has emerged as an appealing alternative. While existing reviews have mentioned image-based prediction of biomarkers within specific disease contexts, this review provides a comprehensive overview of current methods that leverage imaging to predict (i) DNA-based aberrations, (ii) bulk transcriptomic profiles, (iii) single-cell transcriptomics, and (iv) spatial transcriptomics across disease contexts and imaging modalities. To address the complexity of these predictive tasks, we find that many studies employ cutting-edge deep learning strategies for image processing, feature extraction, feature aggregation, and downstream molecular prediction. In this review, we highlight the diverse applications of both deep learning-based and modern statistical frameworks designed for image-based omics prediction. The insights gleaned from these inferred molecular data have broad clinical relevance and will continue to improve our understanding of the relationships between molecular and visual features, paving the way for new diagnostic and therapeutic applications.

虽然传统的成像技术，如组织病理学，通常是临床工作流程的一部分，但分子谱分析仍然很难进行，而且成本效益较低。因此，直接从成像中预测分子“组学”数据已成为一种有吸引力的选择。虽然现有的综述已经提到了在特定疾病背景下基于图像的生物标志物预测，但本综述全面概述了目前利用成像预测(i)基于dna的畸变，（ii）大量转录组谱，（iii）单细胞转录组学，以及（iv）跨疾病背景和成像方式的空间转录组学的方法。为了解决这些预测任务的复杂性，我们发现许多研究采用尖端的深度学习策略进行图像处理、特征提取、特征聚合和下游分子预测。在这篇综述中，我们强调了基于深度学习和现代统计框架的各种应用，这些框架是为基于图像的组学预测设计的。从这些推断的分子数据中收集到的见解具有广泛的临床相关性，并将继续提高我们对分子和视觉特征之间关系的理解，为新的诊断和治疗应用铺平道路。

{"title":"Advances in predicting omics profiles from imaging data.","authors":"Alexa H Beachum, Xue Xiao, Yuansheng Zhou, Qiwei Li, Guanghua Xiao, Lin Xu","doi":"10.1093/bib/bbag090","DOIUrl":"10.1093/bib/bbag090","url":null,"abstract":"While traditional imaging techniques, such as histopathology, are often part of clinical workflows, molecular profiling remains more difficult to conduct and is less cost-effective. Thus, the prediction of molecular 'omics' data directly from imaging has emerged as an appealing alternative. While existing reviews have mentioned image-based prediction of biomarkers within specific disease contexts, this review provides a comprehensive overview of current methods that leverage imaging to predict (i) DNA-based aberrations, (ii) bulk transcriptomic profiles, (iii) single-cell transcriptomics, and (iv) spatial transcriptomics across disease contexts and imaging modalities. To address the complexity of these predictive tasks, we find that many studies employ cutting-edge deep learning strategies for image processing, feature extraction, feature aggregation, and downstream molecular prediction. In this review, we highlight the diverse applications of both deep learning-based and modern statistical frameworks designed for image-based omics prediction. The insights gleaned from these inferred molecular data have broad clinical relevance and will continue to improve our understanding of the relationships between molecular and visual features, paving the way for new diagnostic and therapeutic applications.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12971000/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147389620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ML-ExonCNV: a robust XGBoost multi-expert ensemble framework for rare exon CNV detection in whole-exome sequencing data. ML-ExonCNV：一个强大的XGBoost多专家集成框架，用于全外显子测序数据中的罕见外显子CNV检测。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag100

Shuang-Hao Yang, Hua He, Shuyu Hou, Tuanfeng Yang, Zehao Yin, Hong-Yu Zhang, Weiyue Gu

Copy number variants (CNVs) have been shown to play a significant role in the pathogenesis of various human diseases. Although several tools have been developed for detecting CNVs based on whole-exome sequencing (WES) data, their performance remains suboptimal for small exon-level CNVs (exCNVs). This is primarily due to multiple technical variabilities, including probe capture efficiency, mappability, exon size, batch effects, experimental background noise bias, and control sample selection, all of which can lead to false negatives and false positives in exCNV detection. To address these challenges, we developed ML-ExonCNV, which innovatively integrates the XGBoost machine learning model with a multi-expert ensemble approach. The model was trained using 14 features derived from 22 364 real-world, quantitative polymerase chain reaction-validated rare exCNVs. Evaluation on a test set of 492 real WES and the NA12878 gold-standard dataset demonstrated that ML-ExonCNV outperformed widely used tools such as GATK-gCNV, ExomeDepth, and CNVkit. Notably, ML-ExonCNV can detect large segmental CNVs, mosaic CNVs, and breakpoint CNVs on exon region. Furthermore, our analysis revealed recurrent exCNV-associated genes and their phenotypic correlations. Neurodevelopmental and musculoskeletal abnormalities were identified as the most frequently associated phenotypes with high-recurrence exCNVs.

拷贝数变异（CNVs）已被证明在各种人类疾病的发病机制中发挥重要作用。尽管已经开发了几种基于全外显子测序（WES）数据检测CNVs的工具，但它们对于小外显子水平CNVs （exCNVs）的性能仍然不是最佳的。这主要是由于多种技术变化，包括探针捕获效率、可映射性、外显子大小、批量效应、实验背景噪声偏差和控制样本选择，所有这些都可能导致exCNV检测中的假阴性和假阳性。为了应对这些挑战，我们开发了ML-ExonCNV，它创新地将XGBoost机器学习模型与多专家集成方法集成在一起。该模型使用来自22 364个真实世界、定量聚合酶链反应验证的罕见excnv的14个特征进行训练。在492个真实WES测试集和NA12878金标准数据集上的评估表明，ML-ExonCNV优于广泛使用的工具，如GATK-gCNV， ExomeDepth和CNVkit。值得注意的是，ML-ExonCNV可以检测外显子区域的大片段性CNVs、镶嵌性CNVs和断点CNVs。此外，我们的分析揭示了复发性excnv相关基因及其表型相关性。神经发育和肌肉骨骼异常被确定为与高复发excnv最常见的相关表型。

{"title":"ML-ExonCNV: a robust XGBoost multi-expert ensemble framework for rare exon CNV detection in whole-exome sequencing data.","authors":"Shuang-Hao Yang, Hua He, Shuyu Hou, Tuanfeng Yang, Zehao Yin, Hong-Yu Zhang, Weiyue Gu","doi":"10.1093/bib/bbag100","DOIUrl":"10.1093/bib/bbag100","url":null,"abstract":"Copy number variants (CNVs) have been shown to play a significant role in the pathogenesis of various human diseases. Although several tools have been developed for detecting CNVs based on whole-exome sequencing (WES) data, their performance remains suboptimal for small exon-level CNVs (exCNVs). This is primarily due to multiple technical variabilities, including probe capture efficiency, mappability, exon size, batch effects, experimental background noise bias, and control sample selection, all of which can lead to false negatives and false positives in exCNV detection. To address these challenges, we developed ML-ExonCNV, which innovatively integrates the XGBoost machine learning model with a multi-expert ensemble approach. The model was trained using 14 features derived from 22 364 real-world, quantitative polymerase chain reaction-validated rare exCNVs. Evaluation on a test set of 492 real WES and the NA12878 gold-standard dataset demonstrated that ML-ExonCNV outperformed widely used tools such as GATK-gCNV, ExomeDepth, and CNVkit. Notably, ML-ExonCNV can detect large segmental CNVs, mosaic CNVs, and breakpoint CNVs on exon region. Furthermore, our analysis revealed recurrent exCNV-associated genes and their phenotypic correlations. Neurodevelopmental and musculoskeletal abnormalities were identified as the most frequently associated phenotypes with high-recurrence exCNVs.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12981628/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147442597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A deep adversarial network model for multi-task analysis of single-cell omics data. 单细胞组学数据多任务分析的深度对抗网络模型。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag016

Junlin Xu, Cheng Guo, Yajie Meng, Shuting Jin, Changcheng Lu, Zilong Zhang, Feifei Cui, Xiangzheng Fu, Quan Zou, Tian Tian, Xiangxiang Zeng

Single-cell multi-omics data reveal complex cellular states and deepen our understanding of tissue cell phenotypes and functions. However, data analysis remains challenging due to the discrete nature and high noise level of the data, as well as the lack of modality. Here, we propose scMultiNet, a multi-task deep adversarial neural network that can integrate different tasks to analyze single-cell multi-modal data. In particular, we achieve joint training of multi-modal integration and cross-modal prediction tasks by introducing a cross-modal bi-prediction module and a multi-head self-attention module. Data denoising is further enhanced by integrating an indicator matrix that constrains and precisely reconstructs the original expression values. Extensive simulations and real data experiments demonstrate that scMultiNet outperforms existing state-of-the-art methods in dimensionality reduction, visualization, clustering, batch elimination, data denoising, multi-modal integration, single-cell cross-modality translation, and in revealing cell type-specific biological insights. In addition, we demonstrate that scMultiNet can effectively transfer the complex relationships between modalities from one batch to another. In summary, scMultiNet stands as a comprehensive end-to-end framework, ideally suited for analyzing single-cell multi-omics data.

单细胞多组学数据揭示了复杂的细胞状态，加深了我们对组织细胞表型和功能的理解。然而，由于数据的离散性和高噪声水平，以及缺乏模态，数据分析仍然具有挑战性。在这里，我们提出了scmultiet，一个多任务深度对抗神经网络，可以整合不同的任务来分析单细胞多模态数据。特别地，我们通过引入跨模态双预测模块和多头自注意模块，实现了多模态集成和跨模态预测任务的联合训练。通过整合一个约束和精确重建原始表达式值的指标矩阵，进一步增强数据去噪。大量的模拟和真实数据实验表明，scMultiNet在降维、可视化、聚类、批处理消除、数据去噪、多模态集成、单细胞跨模态翻译以及揭示细胞类型特异性生物学见解方面优于现有的最先进方法。此外，我们证明了scmultiet可以有效地将模态之间的复杂关系从一个批次转移到另一个批次。总之，scMultiNet是一个全面的端到端框架，非常适合分析单细胞多组学数据。

{"title":"A deep adversarial network model for multi-task analysis of single-cell omics data.","authors":"Junlin Xu, Cheng Guo, Yajie Meng, Shuting Jin, Changcheng Lu, Zilong Zhang, Feifei Cui, Xiangzheng Fu, Quan Zou, Tian Tian, Xiangxiang Zeng","doi":"10.1093/bib/bbag016","DOIUrl":"10.1093/bib/bbag016","url":null,"abstract":"Single-cell multi-omics data reveal complex cellular states and deepen our understanding of tissue cell phenotypes and functions. However, data analysis remains challenging due to the discrete nature and high noise level of the data, as well as the lack of modality. Here, we propose scMultiNet, a multi-task deep adversarial neural network that can integrate different tasks to analyze single-cell multi-modal data. In particular, we achieve joint training of multi-modal integration and cross-modal prediction tasks by introducing a cross-modal bi-prediction module and a multi-head self-attention module. Data denoising is further enhanced by integrating an indicator matrix that constrains and precisely reconstructs the original expression values. Extensive simulations and real data experiments demonstrate that scMultiNet outperforms existing state-of-the-art methods in dimensionality reduction, visualization, clustering, batch elimination, data denoising, multi-modal integration, single-cell cross-modality translation, and in revealing cell type-specific biological insights. In addition, we demonstrate that scMultiNet can effectively transfer the complex relationships between modalities from one batch to another. In summary, scMultiNet stands as a comprehensive end-to-end framework, ideally suited for analyzing single-cell multi-omics data.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12989322/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147462675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SigRescueR: a pan-system framework for noise correction and mutational signature identification across sequencing platforms. SigRescueR：一个泛系统框架，用于跨测序平台的噪声校正和突变签名识别。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag099

Peter T Nguyen, Maria Zhivagui

Introduction: Mutational signatures serve as molecular fingerprints of the biological processes and exposures that shape cancer genomes. However, accurate signal recovery remains challenging due to pervasive background variants, sequencing artifacts, technical noise, and platform-specific biases that obscure true mutagenic patterns, hampering biomarker discovery, and mechanistic interpretation.

Methods: Here we introduce SigRescueR, a rigorous, pan-system, computational framework based on Bayesian inference designed for noise correction and mutational signature identification. SigRescueR applies statistically robust baseline correction to effectively disentangle true mutational signals from confounding noise and artifacts.

Results: When applied to extensive datasets spanning experimental models and human cancers, SigRescueR reliably identified canonical mutational signatures associated with environmental mutagens such as colibactin, benzo[a]pyrene, and UV radiation, and chemotherapeutic agents, namely 5-fluorouracil and cisplatin. SigRescueR effectively operated across diverse mutation classes, including single base substitutions, insertions and deletions, and doublet base substitutions, while also integrating strand bias and duplex sequencing data for toxicology applications.

Conclusion: SigRescueR offers a unified, high-precision platform that seamlessly integrates cancer genomics, molecular toxicology, and mechanistic studies. It enables precise mapping of mutagenic processes and identification of robust genomic biomarkers of environmental and therapeutic exposures, providing a transformative framework for translational cancer research.

Availability and implementation: SigRescueR is implemented in R and provided as open-source software on GitHub at https://github.com/ZhivaguiLab/SigRescueR/.

简介：突变签名作为塑造癌症基因组的生物过程和暴露的分子指纹。然而，由于普遍存在的背景变异、测序伪影、技术噪声和平台特定偏差，使得真正的诱变模式变得模糊，阻碍了生物标志物的发现和机制解释，准确的信号恢复仍然具有挑战性。方法：本文介绍了基于贝叶斯推理的SigRescueR，这是一个严格的泛系统计算框架，用于噪声校正和突变签名识别。SigRescueR应用统计上稳健的基线校正，有效地从混杂噪声和伪像中分离出真实的突变信号。结果：当应用于跨越实验模型和人类癌症的广泛数据集时，SigRescueR可靠地识别出与环境诱变剂（如大肠杆菌素、苯并[a]芘、紫外线辐射）和化疗药物（即5-氟尿嘧啶和顺铂）相关的典型突变特征。SigRescueR在多种突变类别中有效运行，包括单碱基替换、插入和缺失以及双碱基替换，同时还集成了毒理学应用的链偏置和双工测序数据。结论：SigRescueR提供了一个统一的、高精度的平台，无缝集成了癌症基因组学、分子毒理学和机制研究。它能够精确绘制诱变过程和识别环境和治疗暴露的强大基因组生物标志物，为转化性癌症研究提供了一个变革性框架。可用性和实现：SigRescueR是在R中实现的，并作为开源软件在GitHub上提供https://github.com/ZhivaguiLab/SigRescueR/。

{"title":"SigRescueR: a pan-system framework for noise correction and mutational signature identification across sequencing platforms.","authors":"Peter T Nguyen, Maria Zhivagui","doi":"10.1093/bib/bbag099","DOIUrl":"10.1093/bib/bbag099","url":null,"abstract":"Introduction: Mutational signatures serve as molecular fingerprints of the biological processes and exposures that shape cancer genomes. However, accurate signal recovery remains challenging due to pervasive background variants, sequencing artifacts, technical noise, and platform-specific biases that obscure true mutagenic patterns, hampering biomarker discovery, and mechanistic interpretation.Methods: Here we introduce SigRescueR, a rigorous, pan-system, computational framework based on Bayesian inference designed for noise correction and mutational signature identification. SigRescueR applies statistically robust baseline correction to effectively disentangle true mutational signals from confounding noise and artifacts.Results: When applied to extensive datasets spanning experimental models and human cancers, SigRescueR reliably identified canonical mutational signatures associated with environmental mutagens such as colibactin, benzo[a]pyrene, and UV radiation, and chemotherapeutic agents, namely 5-fluorouracil and cisplatin. SigRescueR effectively operated across diverse mutation classes, including single base substitutions, insertions and deletions, and doublet base substitutions, while also integrating strand bias and duplex sequencing data for toxicology applications.Conclusion: SigRescueR offers a unified, high-precision platform that seamlessly integrates cancer genomics, molecular toxicology, and mechanistic studies. It enables precise mapping of mutagenic processes and identification of robust genomic biomarkers of environmental and therapeutic exposures, providing a transformative framework for translational cancer research.Availability and implementation: SigRescueR is implemented in R and provided as open-source software on GitHub at https://github.com/ZhivaguiLab/SigRescueR/.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12963972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147364095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Zero-shot benchmarking of RNA language models in structural, functional, and evolutionary learning. RNA语言模型在结构、功能和进化学习中的零基准测试。

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics

Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag098

He Wang, Yikun Zhang, Jie Chen, Jian Zhan, Yaoqi Zhou

RNA language models (LMs) are increasingly applied to RNA structure and function analysis, yet their intrinsic representational capacities remain poorly characterized. Here, we present a standardized zero-shot evaluation of 21 RNA LMs, with representative DNA LMs included as reference controls. Three complementary tasks-attention-based RNA secondary structure prediction, embedding-based RNA classification, and mutational fitness estimation from sequence likelihoods-are evaluated without downstream fine-tuning. Our results reveal substantial variability across models and clear trade-offs between structural, functional, and evolutionary representations. RNA-specific, noncoding RNA-enriched pretraining is crucial for capturing structural information, while evolutionary signals from multiple sequence alignments substantially boost performance. Although model scaling yields gains, architectural and objective choices critically influence performance across task categories. Together, this study provides a foundational benchmark, highlights inherent challenges in learning unified RNA representations, and offers insights for developing next-generation RNA foundation models.

RNA语言模型（LMs）越来越多地应用于RNA结构和功能分析，但其固有的表征能力仍然缺乏表征。在这里，我们提出了21个RNA LMs的标准化零射击评估，包括代表性的DNA LMs作为参考对照。三个互补的任务-基于注意力的RNA二级结构预测，基于嵌入的RNA分类和序列似然的突变适应度估计-在没有下游微调的情况下进行评估。我们的研究结果揭示了模型之间的巨大可变性，以及结构、功能和进化表征之间的清晰权衡。rna特异性、非编码rna富集的预训练对于捕获结构信息至关重要，而来自多序列比对的进化信号大大提高了性能。尽管模型缩放会产生收益，但架构和目标选择会严重影响跨任务类别的性能。总之，这项研究提供了一个基础基准，突出了学习统一RNA表示的固有挑战，并为开发下一代RNA基础模型提供了见解。

{"title":"Zero-shot benchmarking of RNA language models in structural, functional, and evolutionary learning.","authors":"He Wang, Yikun Zhang, Jie Chen, Jian Zhan, Yaoqi Zhou","doi":"10.1093/bib/bbag098","DOIUrl":"10.1093/bib/bbag098","url":null,"abstract":"RNA language models (LMs) are increasingly applied to RNA structure and function analysis, yet their intrinsic representational capacities remain poorly characterized. Here, we present a standardized zero-shot evaluation of 21 RNA LMs, with representative DNA LMs included as reference controls. Three complementary tasks-attention-based RNA secondary structure prediction, embedding-based RNA classification, and mutational fitness estimation from sequence likelihoods-are evaluated without downstream fine-tuning. Our results reveal substantial variability across models and clear trade-offs between structural, functional, and evolutionary representations. RNA-specific, noncoding RNA-enriched pretraining is crucial for capturing structural information, while evolutionary signals from multiple sequence alignments substantially boost performance. Although model scaling yields gains, architectural and objective choices critically influence performance across task categories. Together, this study provides a foundational benchmark, highlights inherent challenges in learning unified RNA representations, and offers insights for developing next-generation RNA foundation models.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12963973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147364182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0