首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
Enhancing cancer classification accuracy with a self-attention network using panel capture sequencing data. 使用面板捕获测序数据的自关注网络提高癌症分类准确性。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag120
Yi Jia, Chan Zhang, Han Zhang, Kang Dong, Yuruo Hu, Yinan Wang, Zicheng Zhao

Cancer classification is pivotal for precision oncology, yet traditional methods struggle with the molecular heterogeneity of tumors. Our study introduces a self-attention based Conv1D machine learning network designed for panel capture sequencing data, which is more commonly used in clinical settings. Combining clinical capture sequencing data and The Cancer Genome Atlas data, we achieved an overall classification accuracy of over 90%, with precision rates reaching 100% for cervical and gastric cancers. Additionally, recall rates were highest at 95.79% for gastric cancer and lowest at 77.46% for cervical cancer, demonstrating robust performance across various cancer types. The model identified key genes such as C3orf36, JHY, and TASP1, showing significant differences in mutation counts across cancers. High-impact gene enrichment analysis highlighted critical pathways like acute myeloid leukemia and adipocytokine signaling. This approach not only significantly improves the precision of cancer classification, demonstrating the potential for clinical application, but also enhances our understanding of cancer biology.

肿瘤分类是精确肿瘤学的关键,但传统的方法与肿瘤的分子异质性作斗争。我们的研究引入了一种基于自关注的Conv1D机器学习网络,该网络专为面板捕获测序数据而设计,该网络更常用于临床环境。结合临床捕获测序数据和The Cancer Genome Atlas数据,我们实现了90%以上的总体分类准确率,其中宫颈癌和胃癌的准确率达到100%。此外,胃癌的召回率最高,为95.79%,宫颈癌的召回率最低,为77.46%,在各种癌症类型中表现出强劲的表现。该模型确定了C3orf36、JHY和TASP1等关键基因,显示出不同癌症之间突变数量的显著差异。高影响基因富集分析强调了关键途径,如急性髓系白血病和脂肪细胞因子信号。该方法不仅显著提高了肿瘤分类的精度,显示了临床应用的潜力,而且增强了我们对癌症生物学的认识。
{"title":"Enhancing cancer classification accuracy with a self-attention network using panel capture sequencing data.","authors":"Yi Jia, Chan Zhang, Han Zhang, Kang Dong, Yuruo Hu, Yinan Wang, Zicheng Zhao","doi":"10.1093/bib/bbag120","DOIUrl":"https://doi.org/10.1093/bib/bbag120","url":null,"abstract":"<p><p>Cancer classification is pivotal for precision oncology, yet traditional methods struggle with the molecular heterogeneity of tumors. Our study introduces a self-attention based Conv1D machine learning network designed for panel capture sequencing data, which is more commonly used in clinical settings. Combining clinical capture sequencing data and The Cancer Genome Atlas data, we achieved an overall classification accuracy of over 90%, with precision rates reaching 100% for cervical and gastric cancers. Additionally, recall rates were highest at 95.79% for gastric cancer and lowest at 77.46% for cervical cancer, demonstrating robust performance across various cancer types. The model identified key genes such as C3orf36, JHY, and TASP1, showing significant differences in mutation counts across cancers. High-impact gene enrichment analysis highlighted critical pathways like acute myeloid leukemia and adipocytokine signaling. This approach not only significantly improves the precision of cancer classification, demonstrating the potential for clinical application, but also enhances our understanding of cancer biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147497746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kun-peng enables scalable and accurate pan-domain metagenomic classification. 鲲鹏实现可扩展和准确的泛域宏基因组分类。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag119
Qiong Chen, Boliang Zhang, Chen Peng, Jiajun Huang, Zhen Liu, Xiaotao Shen, Chao Jiang

Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2-11.2 min with 4.0-35.4 GB peak memory, corresponding to a 54-473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%-94.3% of reads, improving coverage by 20%-60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.

综合泛域宏基因组分类越来越受到构建和查询快速扩展的参考基因组空间的内存和运行时间成本的限制。我们介绍鲲鹏,一个由智能块分区数据库结构和优化的搜索策略驱动的分类分类器,实现超可扩展,内存高效的泛域分析。使用Critical Assessment of Metagenome Interpretation II基准,与Kraken2相比,鲲鹏将数据库构建和查询的内存使用量大幅降低了24倍,并将样本分类速度提高了4.73倍。与Kraken2,离心机,甚至KrakenUniq相比,鲲鹏实现了具有竞争力的准确性和更少的误报,同时在不同的数据集上保持一致的高灵敏度。在对空气、水、土壤和人类相关环境中的586个宏基因组样本的实际评估中,我们使用了由鲲鹏以4.1 GB峰值内存构建的包含204,477个基因组的4.3 TB泛域数据库进行分类。鲲鹏在0.2-11.2分钟内处理每个样本,峰值内存为4.0-35.4 GB,相对于Kraken2,内存使用减少了54-473倍。与Sylph相比,鲲鹏实现了高达46倍的加速,而需要的内存减少了21倍。鲲鹏分类了69.8%-94.3%的reads,比标准Kraken2数据库的62026个基因组的覆盖率提高了20%-60%。这一改进反映了参考覆盖率的扩大,尽管基于k-mer的方法固有的一小部分误报。总体而言,鲲鹏有效地解决了泛域数据库构建和分类中长期存在的内存瓶颈,实现了复杂环境、生态和暴露体测序数据集的快速、可扩展的泛域分类分析。
{"title":"Kun-peng enables scalable and accurate pan-domain metagenomic classification.","authors":"Qiong Chen, Boliang Zhang, Chen Peng, Jiajun Huang, Zhen Liu, Xiaotao Shen, Chao Jiang","doi":"10.1093/bib/bbag119","DOIUrl":"10.1093/bib/bbag119","url":null,"abstract":"<p><p>Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2-11.2 min with 4.0-35.4 GB peak memory, corresponding to a 54-473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%-94.3% of reads, improving coverage by 20%-60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12991049/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147466884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Castl: robust identification of spatially variable genes in spatial transcriptomics via an ensemble-based framework. 通过基于集成的框架在空间转录组学中识别空间可变基因。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag074
Yiyi Yu, Jiyuan Yang, Ping-An He, Xiaoqi Zheng

Spatially variable genes (SVGs) are essential for elucidating tissue organization within spatially resolved transcriptomics. While a number of computational methods have been developed for SVG identification, their reliance on algorithm-specific assumptions, such as predefined kernel functions or spatial neighborhood graphs, often results in substantial variability in sensitivity and inflated false discovery rates (FDRs) across heterogeneous datasets. To address this challenge, we here develop Castl, an ensemble-based framework for SVG identification that integrates multiple detection methods through statistically designed aggregation modules. Comprehensive evaluations on both simulated and real-world data demonstrate that Castl consistently identifies biologically meaningful spatial expression patterns, mitigates method-specific biases and effectively controls FDRs across various biological contexts, resolutions, and spatial technologies. This flexible, assumption-free framework offers a robust and standardized foundation for spatially informed feature discovery in complex biological systems.

空间可变基因(SVGs)是在空间分解转录组学中阐明组织组织的必要条件。虽然已经开发了许多用于SVG识别的计算方法,但它们依赖于特定于算法的假设,例如预定义的核函数或空间邻域图,这通常会导致跨异构数据集的灵敏度和虚高的错误发现率(fdr)存在很大差异。为了解决这个问题,我们开发了Castl,这是一个基于集成的SVG识别框架,通过统计设计的聚合模块集成了多种检测方法。对模拟和现实世界数据的综合评估表明,Castl一致地识别出具有生物学意义的空间表达模式,减轻了方法特异性偏差,并有效地控制了不同生物背景、分辨率和空间技术下的fdr。这种灵活的、无假设的框架为复杂生物系统的空间信息特征发现提供了强大和标准化的基础。
{"title":"Castl: robust identification of spatially variable genes in spatial transcriptomics via an ensemble-based framework.","authors":"Yiyi Yu, Jiyuan Yang, Ping-An He, Xiaoqi Zheng","doi":"10.1093/bib/bbag074","DOIUrl":"10.1093/bib/bbag074","url":null,"abstract":"<p><p>Spatially variable genes (SVGs) are essential for elucidating tissue organization within spatially resolved transcriptomics. While a number of computational methods have been developed for SVG identification, their reliance on algorithm-specific assumptions, such as predefined kernel functions or spatial neighborhood graphs, often results in substantial variability in sensitivity and inflated false discovery rates (FDRs) across heterogeneous datasets. To address this challenge, we here develop Castl, an ensemble-based framework for SVG identification that integrates multiple detection methods through statistically designed aggregation modules. Comprehensive evaluations on both simulated and real-world data demonstrate that Castl consistently identifies biologically meaningful spatial expression patterns, mitigates method-specific biases and effectively controls FDRs across various biological contexts, resolutions, and spatial technologies. This flexible, assumption-free framework offers a robust and standardized foundation for spatially informed feature discovery in complex biological systems.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12963980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147364150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scSCCNIA: similarity matrix based contrastive clustering with neighbor information aggregation for single-cell RNA sequencing data. scSCCNIA:基于相似性矩阵与邻居信息聚合的单细胞RNA测序数据对比聚类。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag094
Jing Wang, Junfeng Xia, Yansen Su, Chun-Hou Zheng

The development of single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for elucidating cell heterogeneity and gene expression. Identifying and discovering cell types through cell clustering is a crucial step in analyzing scRNA-seq data. However, the high-dimensionality nature and frequent dropout events of the data raise great challenges for cell clustering. Here, we propose a novel contrastive clustering framework called scSCCNIA (Similarity-matrix-based Contrastive Clustering with Neighbor Information Aggregation), for the accurate identification of cell clusters from scRNA-seq data. scSCCNIA adopts a Laplacian filter to conduct neighbor information aggregation, constructs different graph views by using special un-shared parameters Siamese encoders for data augmentation, and learns the latent low-dimensional embedding representations via similarity-matrix-based contrastive learning. Comparative analyses of multiple scRNA-seq datasets from different platforms and with varying cell numbers demonstrate that scSCCNIA outperforms existing methods in terms of cell clustering and marker gene identification. Furthermore, scSCCNIA reveals the heterogeneity and functional specificity of various cell types through Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. Overall, scSCCNIA is an effective algorithm for learning latent features from scRNA-seq data, enhancing cell type identification accuracy and facilitating downstream analyses of scRNA-seq data.

单细胞RNA测序(scRNA-seq)技术的发展为阐明细胞异质性和基因表达提供了前所未有的机会。通过细胞聚类鉴定和发现细胞类型是分析scRNA-seq数据的关键步骤。然而,数据的高维性和频繁的脱落事件给细胞聚类带来了很大的挑战。在这里,我们提出了一种新的对比聚类框架,称为scSCCNIA(基于相似性矩阵的对比聚类与邻居信息聚集),用于从scRNA-seq数据中准确识别细胞簇。scSCCNIA采用拉普拉斯滤波器进行邻居信息聚合,采用特殊的非共享参数Siamese编码器构建不同的图视图进行数据增强,并通过基于相似矩阵的对比学习学习潜在的低维嵌入表示。来自不同平台和不同细胞数量的多个scRNA-seq数据集的比较分析表明,scSCCNIA在细胞聚类和标记基因鉴定方面优于现有方法。此外,scSCCNIA通过基因本体术语和京都基因和基因组百科全书富集分析揭示了各种细胞类型的异质性和功能特异性。总的来说,scSCCNIA是一种有效的算法,可以从scRNA-seq数据中学习潜在特征,提高细胞类型鉴定的准确性,促进scRNA-seq数据的下游分析。
{"title":"scSCCNIA: similarity matrix based contrastive clustering with neighbor information aggregation for single-cell RNA sequencing data.","authors":"Jing Wang, Junfeng Xia, Yansen Su, Chun-Hou Zheng","doi":"10.1093/bib/bbag094","DOIUrl":"10.1093/bib/bbag094","url":null,"abstract":"<p><p>The development of single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for elucidating cell heterogeneity and gene expression. Identifying and discovering cell types through cell clustering is a crucial step in analyzing scRNA-seq data. However, the high-dimensionality nature and frequent dropout events of the data raise great challenges for cell clustering. Here, we propose a novel contrastive clustering framework called scSCCNIA (Similarity-matrix-based Contrastive Clustering with Neighbor Information Aggregation), for the accurate identification of cell clusters from scRNA-seq data. scSCCNIA adopts a Laplacian filter to conduct neighbor information aggregation, constructs different graph views by using special un-shared parameters Siamese encoders for data augmentation, and learns the latent low-dimensional embedding representations via similarity-matrix-based contrastive learning. Comparative analyses of multiple scRNA-seq datasets from different platforms and with varying cell numbers demonstrate that scSCCNIA outperforms existing methods in terms of cell clustering and marker gene identification. Furthermore, scSCCNIA reveals the heterogeneity and functional specificity of various cell types through Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. Overall, scSCCNIA is an effective algorithm for learning latent features from scRNA-seq data, enhancing cell type identification accuracy and facilitating downstream analyses of scRNA-seq data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12962064/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147364179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A progressive fine-tuning framework with dynamic parameter selection for low-resource peptide-GPCR interaction prediction. 基于动态参数选择的渐进式微调框架用于低资源多肽- gpcr相互作用预测。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag116
Mingqing Liu, Jinhui Xu, Ji Liu

G protein-coupled receptors (GPCRs) are among the most important drug targets, and peptide therapeutics are rapidly emerging. However, accurate prediction of peptide-GPCR interactions (PepGI) remains challenging due to the scarcity of high-quality data and the poor generalization of existing drug-target interaction (DTI) models, which are largely trained on small molecule data. Here, we introduce a progressive fine-tuning framework with a dynamic parameter selection strategy that adaptively selects critical fine-tuning parameters using Fisher information. Our method begins with pretraining on a large small molecule-GPCR dataset, followed by intermediate fine-tuning on peptide-target data to alleviate the representation mismatch across heterogeneous ligand modalities. Finally, the task-specific fine-tuning is performed on the low-resource PepGI scenario. Extensive experiments show that our approach significantly outperforms baselines across multiple evaluation metrics, and exhibits robust generalization under few-shot and practical cold-start settings. Overall, this work offers an effective solution for low-resource peptide-GPCR prediction and presents a transferable framework for cross-structure DTI modeling.

G蛋白偶联受体(gpcr)是最重要的药物靶点之一,肽治疗正在迅速兴起。然而,由于缺乏高质量的数据和现有的药物-靶标相互作用(DTI)模型的泛化能力差,肽- gpcr相互作用(PepGI)的准确预测仍然具有挑战性,这些模型主要是在小分子数据上训练的。在这里,我们引入了一个渐进的微调框架,该框架具有动态参数选择策略,该策略使用Fisher信息自适应地选择关键的微调参数。我们的方法首先在一个大的小分子gpcr数据集上进行预训练,然后对肽靶数据进行中间微调,以减轻异质配体模式之间的表示不匹配。最后,在低资源PepGI场景上执行特定于任务的微调。大量的实验表明,我们的方法在多个评估指标上明显优于基线,并且在少量射击和实际冷启动设置下表现出强大的泛化。总的来说,这项工作为低资源肽- gpcr预测提供了有效的解决方案,并为跨结构DTI建模提供了一个可转移的框架。
{"title":"A progressive fine-tuning framework with dynamic parameter selection for low-resource peptide-GPCR interaction prediction.","authors":"Mingqing Liu, Jinhui Xu, Ji Liu","doi":"10.1093/bib/bbag116","DOIUrl":"10.1093/bib/bbag116","url":null,"abstract":"<p><p>G protein-coupled receptors (GPCRs) are among the most important drug targets, and peptide therapeutics are rapidly emerging. However, accurate prediction of peptide-GPCR interactions (PepGI) remains challenging due to the scarcity of high-quality data and the poor generalization of existing drug-target interaction (DTI) models, which are largely trained on small molecule data. Here, we introduce a progressive fine-tuning framework with a dynamic parameter selection strategy that adaptively selects critical fine-tuning parameters using Fisher information. Our method begins with pretraining on a large small molecule-GPCR dataset, followed by intermediate fine-tuning on peptide-target data to alleviate the representation mismatch across heterogeneous ligand modalities. Finally, the task-specific fine-tuning is performed on the low-resource PepGI scenario. Extensive experiments show that our approach significantly outperforms baselines across multiple evaluation metrics, and exhibits robust generalization under few-shot and practical cold-start settings. Overall, this work offers an effective solution for low-resource peptide-GPCR prediction and presents a transferable framework for cross-structure DTI modeling.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12991051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147466888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Publisher's Note: Addendum to Volume 26, Issue Supplement 1, December 2025, International Conference on Genome Informatics ISCB-Asia 2025 Abstract Book. 出版商注:第26卷的附录,发行补充1,2025年12月,基因组信息学国际会议iscb -亚洲2025摘要书。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag026
{"title":"Publisher's Note: Addendum to Volume 26, Issue Supplement 1, December 2025, International Conference on Genome Informatics ISCB-Asia 2025 Abstract Book.","authors":"","doi":"10.1093/bib/bbag026","DOIUrl":"10.1093/bib/bbag026","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12972659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147389551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An integrative association analysis for complex diseases in underrepresented groups by leveraging the trans-ethnic genetic similarity. 利用跨种族遗传相似性对代表性不足群体中复杂疾病的综合关联分析。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag103
Shuo Zhang, Jike Qi, Yuchen Jiang, Hua Lin, Xinyi Wang, Ting Wang, Hongyan Cao, Ping Zeng

Genome-wide association studies (GWASs) have been conducted primarily in European (EUR) populations, limiting insights into underrepresented groups such as East Asian (EAS), but cross-ancestry GWASs have demonstrated high trans-ethnic genetic similarity between EUR and non-EUR populations. To enhance association analysis power in EAS populations, we propose tranScore, a novel summary-statistics-based transfer learning method that leverages trans-ethnic genetic similarity through hierarchical modeling. By considering EUR as auxiliary population, tranScore performs joint testing of genetic effects in auxiliary and target populations via well-established P-value combination procedures. Simulations demonstrate that tranScore maintains control of type I error rates and provides substantial power gains for diverse genetic architectures, showing robustness against various challenges including incomplete SNP overlap and effect heterogeneity. In the real-data application of eight diseases from the China Kadoorie Biobank (CKB), after incorporating the genetic information of the EUR population, tranScore identified significantly more genes than the traditional score test which ignored such information. Approximately 41.9% of discovered genes were replicated in the Biobank Japan cohort. Overall, tranScore represents a flexible and powerful statistical approach for association analysis of complex diseases and traits through transfer learning of shared genetic similarities between the auxiliary and target populations.

全基因组关联研究(GWASs)主要在欧洲(EUR)人群中进行,限制了对代表性不足的群体(如东亚(EAS))的见解,但跨祖先GWASs已经证明了欧洲和非欧洲人群之间高度的跨种族遗传相似性。为了增强东亚地区人群的关联分析能力,我们提出了一种新的基于汇总统计的迁移学习方法transscore,该方法通过分层建模来利用跨种族遗传相似性。通过将EUR视为辅助种群,tranScore通过完善的p值组合程序对辅助种群和目标种群的遗传效应进行联合测试。仿真表明,tranScore保持了对I型错误率的控制,并为不同的遗传结构提供了可观的功率增益,显示出对各种挑战的鲁棒性,包括不完全SNP重叠和效应异质性。在中国嘉道里生物库(CKB)八种疾病的实际数据应用中,在纳入欧洲人群的遗传信息后,tranScore比忽略这些信息的传统评分测试识别出更多的基因。大约41.9%的发现基因在Biobank Japan队列中被复制。总的来说,tranScore代表了一种灵活而强大的统计方法,通过在辅助人群和目标人群之间共享遗传相似性的迁移学习,对复杂疾病和性状进行关联分析。
{"title":"An integrative association analysis for complex diseases in underrepresented groups by leveraging the trans-ethnic genetic similarity.","authors":"Shuo Zhang, Jike Qi, Yuchen Jiang, Hua Lin, Xinyi Wang, Ting Wang, Hongyan Cao, Ping Zeng","doi":"10.1093/bib/bbag103","DOIUrl":"10.1093/bib/bbag103","url":null,"abstract":"<p><p>Genome-wide association studies (GWASs) have been conducted primarily in European (EUR) populations, limiting insights into underrepresented groups such as East Asian (EAS), but cross-ancestry GWASs have demonstrated high trans-ethnic genetic similarity between EUR and non-EUR populations. To enhance association analysis power in EAS populations, we propose tranScore, a novel summary-statistics-based transfer learning method that leverages trans-ethnic genetic similarity through hierarchical modeling. By considering EUR as auxiliary population, tranScore performs joint testing of genetic effects in auxiliary and target populations via well-established P-value combination procedures. Simulations demonstrate that tranScore maintains control of type I error rates and provides substantial power gains for diverse genetic architectures, showing robustness against various challenges including incomplete SNP overlap and effect heterogeneity. In the real-data application of eight diseases from the China Kadoorie Biobank (CKB), after incorporating the genetic information of the EUR population, tranScore identified significantly more genes than the traditional score test which ignored such information. Approximately 41.9% of discovered genes were replicated in the Biobank Japan cohort. Overall, tranScore represents a flexible and powerful statistical approach for association analysis of complex diseases and traits through transfer learning of shared genetic similarities between the auxiliary and target populations.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12971055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147389570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BioMNEDR: mechanism-guided network embedding for drug repurposing. BioMNEDR:药物再利用的机制引导网络嵌入。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag101
Yizhou Zeng, Lei Wang, Xueming Liu

Drug repurposing provides a cost-effective and time-efficient strategy to accelerate therapeutic discovery, yet most computational approaches fail to capture the multi-scale biomedical mechanisms underlying drug-disease associations, limiting interpretability. We introduce BioMNEDR (mechanism-guided network embedding for drug repurposing) that integrates heterogeneous biomedical networks through biologically curated meta-paths. BioMNEDR generates low-dimensional embeddings preserving protein-protein interactions and functional hierarchies. It further integrates multi-path predictions through an XGBoost classifier. The framework achieves state-of-the-art performance, consistently surpassing strong baselines across AUROC, AUPR, recall, and F1-score, while maintaining a balanced trade-off in precision. Case studies further highlight its practical utility, demonstrating the ability to rediscover approved drugs and prioritize promising candidates, such as cromoglicic acid for Alzheimer's disease. By explicitly modeling multi-scale mechanisms, BioMNEDR enhances both predictive accuracy and biomedical interpretability, offering a robust computational framework for systematic drug repurposing.

药物再利用为加速治疗发现提供了一种具有成本效益和时间效率的策略,但大多数计算方法无法捕捉药物-疾病关联背后的多尺度生物医学机制,限制了可解释性。我们介绍了BioMNEDR(机制引导的药物再利用网络嵌入),它通过生物策划的元路径集成了异构生物医学网络。BioMNEDR产生低维嵌入,保留蛋白质相互作用和功能层次。它通过XGBoost分类器进一步集成了多路径预测。该框架实现了最先进的性能,始终超越AUROC、AUPR、召回率和f1分数的强大基线,同时保持了精度的平衡。案例研究进一步强调了它的实用性,展示了重新发现已批准药物和优先考虑有希望的候选药物的能力,例如用于阿尔茨海默病的cromoglicic酸。通过明确建模多尺度机制,BioMNEDR提高了预测准确性和生物医学可解释性,为系统的药物再利用提供了一个强大的计算框架。
{"title":"BioMNEDR: mechanism-guided network embedding for drug repurposing.","authors":"Yizhou Zeng, Lei Wang, Xueming Liu","doi":"10.1093/bib/bbag101","DOIUrl":"10.1093/bib/bbag101","url":null,"abstract":"<p><p>Drug repurposing provides a cost-effective and time-efficient strategy to accelerate therapeutic discovery, yet most computational approaches fail to capture the multi-scale biomedical mechanisms underlying drug-disease associations, limiting interpretability. We introduce BioMNEDR (mechanism-guided network embedding for drug repurposing) that integrates heterogeneous biomedical networks through biologically curated meta-paths. BioMNEDR generates low-dimensional embeddings preserving protein-protein interactions and functional hierarchies. It further integrates multi-path predictions through an XGBoost classifier. The framework achieves state-of-the-art performance, consistently surpassing strong baselines across AUROC, AUPR, recall, and F1-score, while maintaining a balanced trade-off in precision. Case studies further highlight its practical utility, demonstrating the ability to rediscover approved drugs and prioritize promising candidates, such as cromoglicic acid for Alzheimer's disease. By explicitly modeling multi-scale mechanisms, BioMNEDR enhances both predictive accuracy and biomedical interpretability, offering a robust computational framework for systematic drug repurposing.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12971018/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147389581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of control selection strategies on GWAS results: a study of prostate cancer in the UK Biobank. 对照选择策略对GWAS结果的影响:英国生物银行前列腺癌的研究。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag102
Jingzhan Lu, Johan H Thygesen, Robin N Beaumont, Michael N Weedon, Harry D Green

As genome-wide association studies (GWAS) studies move from array-based genotyping to whole exome and genome sequencing, there is a significant increase in cost. Applying an appropriate technique for the selection of which controls to include, in large studies where more potential controls are available than needed for the study, may be a useful technique for minimizing resource intensity whilst maintaining statistical power. We evaluated three control selection strategies in prostate cancer GWAS using 15 250 UK Biobank cases: (a) all controls, (b) matched controls, and (c) random selection. Both (b) and (c) achieved comparable power in detecting significant loci relative to (a), but matched controls (b) showed greater consistency in identifying leading single nucleotide polymorphisms (SNPs). However, using (b) matched controls reduced discovery power by ~30% compared with (a) all controls, highlighting a trade-off. Matching controls (1:4 ratio) offers a cost-effective approach for targeted SNP analysis across phenotypes but may miss novel associations.

随着全基因组关联研究(GWAS)研究从基于阵列的基因分型转向全外显子组和基因组测序,成本显著增加。在可获得的潜在对照多于研究所需的大型研究中,应用适当的技术来选择包括哪些对照,可能是在保持统计效力的同时最小化资源强度的有用技术。我们使用15250例UK Biobank病例评估了前列腺癌GWAS的三种对照选择策略:(a)所有对照,(b)匹配对照,(c)随机选择。(b)和(c)在检测相对于(a)的重要位点方面都取得了相当的能力,但匹配对照(b)在识别领先的单核苷酸多态性(snp)方面表现出更大的一致性。然而,与(a)所有对照相比,使用(b)匹配对照降低了约30%的发现能力,突出了一种权衡。匹配对照(1:4比例)为跨表型的靶向SNP分析提供了一种具有成本效益的方法,但可能会错过新的关联。
{"title":"Impact of control selection strategies on GWAS results: a study of prostate cancer in the UK Biobank.","authors":"Jingzhan Lu, Johan H Thygesen, Robin N Beaumont, Michael N Weedon, Harry D Green","doi":"10.1093/bib/bbag102","DOIUrl":"10.1093/bib/bbag102","url":null,"abstract":"<p><p>As genome-wide association studies (GWAS) studies move from array-based genotyping to whole exome and genome sequencing, there is a significant increase in cost. Applying an appropriate technique for the selection of which controls to include, in large studies where more potential controls are available than needed for the study, may be a useful technique for minimizing resource intensity whilst maintaining statistical power. We evaluated three control selection strategies in prostate cancer GWAS using 15 250 UK Biobank cases: (a) all controls, (b) matched controls, and (c) random selection. Both (b) and (c) achieved comparable power in detecting significant loci relative to (a), but matched controls (b) showed greater consistency in identifying leading single nucleotide polymorphisms (SNPs). However, using (b) matched controls reduced discovery power by ~30% compared with (a) all controls, highlighting a trade-off. Matching controls (1:4 ratio) offers a cost-effective approach for targeted SNP analysis across phenotypes but may miss novel associations.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12971001/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147389643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-based multidimensional feature fusion for accurate prediction of lipid nanoparticles transfection efficiency. 基于变压器的多维特征融合准确预测脂质纳米颗粒转染效率。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag092
Daohong Gong, Xiaowei Xie, Jianxin Tang, Shiliang Li, Honglin Li

RNA-based technologies have demonstrated significant potential for diverse applications, ranging from vaccination to gene editing. However, their widespread adoption is limited by the critical challenge of efficient delivery. Lipid nanoparticles (LNPs) have emerged as a widely utilized RNA delivery system, yet their formulation design and optimization primarily rely on empirical trial-and-error, which is labor-intensive, time-consuming, and cost-prohibitive, thus hindering the rapid development of RNA therapeutics. To facilitate the early-stage design and optimization of LNPs for enhanced delivery efficiency, in this study, we construct LNPs-TE, a benchmark dataset comprising over 10 000 experimentally measured transfection efficiency (TE) values, and introduce LNPs integrated feature fusion Transformer (LIFT), a deep learning framework for LNPs TE prediction. Comprehensive experiments demonstrate that LIFT effectively integrates multidimensional molecular representations of ionizable lipids, the key component in LNPs formulation, achieving superior predictive performance, with an average Pearson correlation coefficient of 0.845 for regression and an area under the receiver operating characteristic curve (AUC-ROC) of 0.818 for multi-class classification across multiple datasets. Through scaffold-based splitting and activity cliff tasks, we further validated the exceptional generalization ability and robustness of LIFT, which achieved over a 10% improvement in the coefficient of determination (R2) compared with state-of-the-art baseline models, highlighting its potential as a practical and stable approach for the virtual screening of efficient LNPs formulation. The relevant data, model and code are made publicly available at https://github.com/U12458/LIFT.

基于rna的技术已经显示出从疫苗接种到基因编辑等各种应用的巨大潜力。然而,它们的广泛采用受到有效交付这一关键挑战的限制。脂质纳米颗粒(LNPs)已成为一种广泛应用的RNA递送系统,但其配方设计和优化主要依赖于经验试错,这是劳动密集型、耗时且成本高昂的,因此阻碍了RNA疗法的快速发展。为了促进LNPs的早期设计和优化以提高传递效率,在本研究中,我们构建了包含超过10,000个实验测量的转染效率(TE)值的LNPs-TE基准数据集,并引入了LNPs集成特征融合变压器(LIFT),一种用于LNPs TE预测的深度学习框架。综合实验表明,LIFT有效地整合了LNPs配方中关键成分可电离脂质的多维分子表征,实现了卓越的预测性能,回归的平均Pearson相关系数为0.845,多数据集多类分类的接受者工作特征曲线下面积(AUC-ROC)为0.818。通过基于支架的分裂和活性悬崖任务,我们进一步验证了LIFT的卓越泛化能力和鲁棒性,与最先进的基线模型相比,LIFT的决定系数(R2)提高了10%以上,突出了它作为有效LNPs配方虚拟筛选的实用和稳定方法的潜力。相关数据、模型和代码可在https://github.com/U12458/LIFT上公开获取。
{"title":"Transformer-based multidimensional feature fusion for accurate prediction of lipid nanoparticles transfection efficiency.","authors":"Daohong Gong, Xiaowei Xie, Jianxin Tang, Shiliang Li, Honglin Li","doi":"10.1093/bib/bbag092","DOIUrl":"10.1093/bib/bbag092","url":null,"abstract":"<p><p>RNA-based technologies have demonstrated significant potential for diverse applications, ranging from vaccination to gene editing. However, their widespread adoption is limited by the critical challenge of efficient delivery. Lipid nanoparticles (LNPs) have emerged as a widely utilized RNA delivery system, yet their formulation design and optimization primarily rely on empirical trial-and-error, which is labor-intensive, time-consuming, and cost-prohibitive, thus hindering the rapid development of RNA therapeutics. To facilitate the early-stage design and optimization of LNPs for enhanced delivery efficiency, in this study, we construct LNPs-TE, a benchmark dataset comprising over 10 000 experimentally measured transfection efficiency (TE) values, and introduce LNPs integrated feature fusion Transformer (LIFT), a deep learning framework for LNPs TE prediction. Comprehensive experiments demonstrate that LIFT effectively integrates multidimensional molecular representations of ionizable lipids, the key component in LNPs formulation, achieving superior predictive performance, with an average Pearson correlation coefficient of 0.845 for regression and an area under the receiver operating characteristic curve (AUC-ROC) of 0.818 for multi-class classification across multiple datasets. Through scaffold-based splitting and activity cliff tasks, we further validated the exceptional generalization ability and robustness of LIFT, which achieved over a 10% improvement in the coefficient of determination (R2) compared with state-of-the-art baseline models, highlighting its potential as a practical and stable approach for the virtual screening of efficient LNPs formulation. The relevant data, model and code are made publicly available at https://github.com/U12458/LIFT.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951077/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147324773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1