Computational methods for therapeutic target discovery face challenges in integrating multi-scale biological data and predicting system-wide therapeutic effects. We present a computational framework that integrates single-cell transcriptomics, weighted gene co-expression network analysis (WGCNA), and computational perturbation simulation to systematically discover novel therapeutic targets. The framework constructs a knowledge graph comprising 29 896 nodes (23 530 genes and 6366 pathways) with 322 136 edges, integrating gene-gene, gene-pathway, and module relationships. Using perturbation simulation algorithms, we systematically explored 265 candidate targets, scoring each based on perturbation magnitude, module response, therapeutic effect, and statistical significance. Applied to single-cell RNA sequencing data from 38 patients (51 415 cells), the framework identified 66 novel therapeutic targets, including nine very high novelty targets. Computational validation demonstrates efficient scalability (knowledge graph construction: <5 min; 265 perturbation simulations: <2 min) and robust performance across different module sizes. This approach represents a novel computational method for systems-level therapeutic target discovery, with generalizable applications to other complex diseases.
{"title":"Computational framework for therapeutic target discovery via perturbation simulation: application to cystic fibrosis airway disease.","authors":"George Sun, Yi-Hui Zhou","doi":"10.1093/bib/bbag129","DOIUrl":"https://doi.org/10.1093/bib/bbag129","url":null,"abstract":"<p><p>Computational methods for therapeutic target discovery face challenges in integrating multi-scale biological data and predicting system-wide therapeutic effects. We present a computational framework that integrates single-cell transcriptomics, weighted gene co-expression network analysis (WGCNA), and computational perturbation simulation to systematically discover novel therapeutic targets. The framework constructs a knowledge graph comprising 29 896 nodes (23 530 genes and 6366 pathways) with 322 136 edges, integrating gene-gene, gene-pathway, and module relationships. Using perturbation simulation algorithms, we systematically explored 265 candidate targets, scoring each based on perturbation magnitude, module response, therapeutic effect, and statistical significance. Applied to single-cell RNA sequencing data from 38 patients (51 415 cells), the framework identified 66 novel therapeutic targets, including nine very high novelty targets. Computational validation demonstrates efficient scalability (knowledge graph construction: <5 min; 265 perturbation simulations: <2 min) and robust performance across different module sizes. This approach represents a novel computational method for systems-level therapeutic target discovery, with generalizable applications to other complex diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147509795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurately predicting protein-ligand interactions is vital for structure-based drug discovery. Although deep learning (DL) models have shown strong performance, the potential of traditional statistical potentials under data-limited conditions remains underexplored. Here, we systematically assess several statistical potential models in docking and virtual screening. We find that docking benefits from distance-dependent pairwise atom-atom potentials with clear physical meanings, while screening relies more on orientation-dependent atom-residue potentials that capture local chemical environments. Based on these findings, we propose HybridSP, a hybrid potential combining distance-dependent atom-atom, atom-residue, and orientation-dependent atom-residue terms. An affinity-weighted scheme is applied to correct biases in statistical distributions. On the CASF-2016 benchmark, HybridSP achieves a 91.6% docking success rate and an enrichment factor of 29.35 at the top 1%, rivaling and even surpassing state-of-the-art DL models. Its strong screening ability is further validated on directory of useful decoys-enhanced and directory of useful decoys-adjusted. These results demonstrate that well-designed statistical potentials can achieve high performance and interpretability without complex DL architectures, offering an efficient alternative for scoring function design. The models are available at: https://github.com/zelixirSH/HybridSP.git.
{"title":"Could statistical potential models achieve comparable or better performance than deep learning models?","authors":"Zhihao Wang, Sheng Wang, Jingjing Guo, Yuguang Mu, Xiangdong Liu, Liangzhen Zheng, Weifeng Li","doi":"10.1093/bib/bbag088","DOIUrl":"10.1093/bib/bbag088","url":null,"abstract":"<p><p>Accurately predicting protein-ligand interactions is vital for structure-based drug discovery. Although deep learning (DL) models have shown strong performance, the potential of traditional statistical potentials under data-limited conditions remains underexplored. Here, we systematically assess several statistical potential models in docking and virtual screening. We find that docking benefits from distance-dependent pairwise atom-atom potentials with clear physical meanings, while screening relies more on orientation-dependent atom-residue potentials that capture local chemical environments. Based on these findings, we propose HybridSP, a hybrid potential combining distance-dependent atom-atom, atom-residue, and orientation-dependent atom-residue terms. An affinity-weighted scheme is applied to correct biases in statistical distributions. On the CASF-2016 benchmark, HybridSP achieves a 91.6% docking success rate and an enrichment factor of 29.35 at the top 1%, rivaling and even surpassing state-of-the-art DL models. Its strong screening ability is further validated on directory of useful decoys-enhanced and directory of useful decoys-adjusted. These results demonstrate that well-designed statistical potentials can achieve high performance and interpretability without complex DL architectures, offering an efficient alternative for scoring function design. The models are available at: https://github.com/zelixirSH/HybridSP.git.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147324693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wasif Jalal, Mubasshira Musarrat, Md Abul Hassan Samee, M Sohel Rahman
Despite aging being a fundamental biological process that profoundly influences health and disease, the interplay between tissue-specific aging and mortality remains underexplored. This study applies machine learning on GTEx transcriptomic data to model tissue-specific biological ages across 12 different types of tissues and introduces an age-gap metric to quantify deviations from the chronological age. We use several modeling techniques optimized with three feature selection strategies: Pearson correlation, age-related differentially expressed genes, and tissue-enriched genes (expressed at least four-fold higher in a specific tissue). Among these, Pearson correlation combined with elastic net regression yields the best performance, with models achieving an average root mean squared error of 6.44 years and an R2 of 0.64. To quantify deviations from chronological age relative to the population, we train neural networks to regress predicted ages against chronological ages, and subtract their outputs from the predicted ages to calculate a metric that we call the age-gap. Age-gap statistics reveal significant tissue-specific aging patterns, identifying extreme agers and correlations between extreme aging and mortality. About 20% of subjects are found to exhibit extreme aging in one tissue, while 1% show multi-organ aging. Further analysis reveals that accelerated aging in specific tissues correlates with greater risk of death from illness. These findings greatly emphasize the role of transcriptomics in aging research and its implications for health and longevity.
{"title":"ORANGE: a machine learning approach for modeling tissue-specific aging from transcriptomic data.","authors":"Wasif Jalal, Mubasshira Musarrat, Md Abul Hassan Samee, M Sohel Rahman","doi":"10.1093/bib/bbag093","DOIUrl":"10.1093/bib/bbag093","url":null,"abstract":"<p><p>Despite aging being a fundamental biological process that profoundly influences health and disease, the interplay between tissue-specific aging and mortality remains underexplored. This study applies machine learning on GTEx transcriptomic data to model tissue-specific biological ages across 12 different types of tissues and introduces an age-gap metric to quantify deviations from the chronological age. We use several modeling techniques optimized with three feature selection strategies: Pearson correlation, age-related differentially expressed genes, and tissue-enriched genes (expressed at least four-fold higher in a specific tissue). Among these, Pearson correlation combined with elastic net regression yields the best performance, with models achieving an average root mean squared error of 6.44 years and an R2 of 0.64. To quantify deviations from chronological age relative to the population, we train neural networks to regress predicted ages against chronological ages, and subtract their outputs from the predicted ages to calculate a metric that we call the age-gap. Age-gap statistics reveal significant tissue-specific aging patterns, identifying extreme agers and correlations between extreme aging and mortality. About 20% of subjects are found to exhibit extreme aging in one tissue, while 1% show multi-organ aging. Further analysis reveals that accelerated aging in specific tissues correlates with greater risk of death from illness. These findings greatly emphasize the role of transcriptomics in aging research and its implications for health and longevity.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147324766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2-11.2 min with 4.0-35.4 GB peak memory, corresponding to a 54-473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%-94.3% of reads, improving coverage by 20%-60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.
综合泛域宏基因组分类越来越受到构建和查询快速扩展的参考基因组空间的内存和运行时间成本的限制。我们介绍鲲鹏,一个由智能块分区数据库结构和优化的搜索策略驱动的分类分类器,实现超可扩展,内存高效的泛域分析。使用Critical Assessment of Metagenome Interpretation II基准,与Kraken2相比,鲲鹏将数据库构建和查询的内存使用量大幅降低了24倍,并将样本分类速度提高了4.73倍。与Kraken2,离心机,甚至KrakenUniq相比,鲲鹏实现了具有竞争力的准确性和更少的误报,同时在不同的数据集上保持一致的高灵敏度。在对空气、水、土壤和人类相关环境中的586个宏基因组样本的实际评估中,我们使用了由鲲鹏以4.1 GB峰值内存构建的包含204,477个基因组的4.3 TB泛域数据库进行分类。鲲鹏在0.2-11.2分钟内处理每个样本,峰值内存为4.0-35.4 GB,相对于Kraken2,内存使用减少了54-473倍。与Sylph相比,鲲鹏实现了高达46倍的加速,而需要的内存减少了21倍。鲲鹏分类了69.8%-94.3%的reads,比标准Kraken2数据库的62026个基因组的覆盖率提高了20%-60%。这一改进反映了参考覆盖率的扩大,尽管基于k-mer的方法固有的一小部分误报。总体而言,鲲鹏有效地解决了泛域数据库构建和分类中长期存在的内存瓶颈,实现了复杂环境、生态和暴露体测序数据集的快速、可扩展的泛域分类分析。
{"title":"Kun-peng enables scalable and accurate pan-domain metagenomic classification.","authors":"Qiong Chen, Boliang Zhang, Chen Peng, Jiajun Huang, Zhen Liu, Xiaotao Shen, Chao Jiang","doi":"10.1093/bib/bbag119","DOIUrl":"10.1093/bib/bbag119","url":null,"abstract":"<p><p>Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2-11.2 min with 4.0-35.4 GB peak memory, corresponding to a 54-473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%-94.3% of reads, improving coverage by 20%-60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12991049/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147466884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The pathological aggregation of α-synuclein (α-syn) constitutes a pivotal hallmark in the progression of neurodegenerative disorders, including Parkinson's disease, underscoring the imperative need for identifying site-specific ligands. This study presents, for the first time, an advanced deep learning framework specifically designed for the prediction of molecular properties associated with α-syn. The framework integrates graph-based contextual attention mechanisms, structural feature aggregation protocols, and dual-channel feature integration, complemented by a composite regularization strategy that synergizes mean squared error minimization, Kullback-Leibler divergence-induced latent space regularization, and L2 norm penalization, thereby delivering outstanding predictive accuracy on the independent test dataset with MSE of 0.1812. Mechanistic insights derived from GNNExplainer analysis and molecular docking studies (PDB: 6A6B) elucidated that aromatic ring systems (benzene ring significance: 0.737) and hydrogen bond donor groups (amino group significance: 0.438) play critical roles in mediating high-affinity ligand-receptor interactions through π-π stacking within the hydrophobic pocket formed by Val82 and Ala89 residues, as well as directed hydrogen bonding involving catalytic residues Ser42 and Lys45. These findings not only enhance the understanding of inhibitor mechanisms but also establish a novel framework for the preliminary screening of small-molecule therapeutics, thereby laying a rigorous groundwork for structure-guided drug optimization and rational molecular design.
{"title":"Drug screening for α-synuclein aggregation inhibitors via multimodal graph neural network.","authors":"Tingle Gu, Zixu Ran, Wenyin Li, Xudong Guo, Bo Li, Fuyi Li, Cangzhi Jia","doi":"10.1093/bib/bbag118","DOIUrl":"10.1093/bib/bbag118","url":null,"abstract":"<p><p>The pathological aggregation of α-synuclein (α-syn) constitutes a pivotal hallmark in the progression of neurodegenerative disorders, including Parkinson's disease, underscoring the imperative need for identifying site-specific ligands. This study presents, for the first time, an advanced deep learning framework specifically designed for the prediction of molecular properties associated with α-syn. The framework integrates graph-based contextual attention mechanisms, structural feature aggregation protocols, and dual-channel feature integration, complemented by a composite regularization strategy that synergizes mean squared error minimization, Kullback-Leibler divergence-induced latent space regularization, and L2 norm penalization, thereby delivering outstanding predictive accuracy on the independent test dataset with MSE of 0.1812. Mechanistic insights derived from GNNExplainer analysis and molecular docking studies (PDB: 6A6B) elucidated that aromatic ring systems (benzene ring significance: 0.737) and hydrogen bond donor groups (amino group significance: 0.438) play critical roles in mediating high-affinity ligand-receptor interactions through π-π stacking within the hydrophobic pocket formed by Val82 and Ala89 residues, as well as directed hydrogen bonding involving catalytic residues Ser42 and Lys45. These findings not only enhance the understanding of inhibitor mechanisms but also establish a novel framework for the preliminary screening of small-molecule therapeutics, thereby laying a rigorous groundwork for structure-guided drug optimization and rational molecular design.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13006971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147497677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Jia, Chan Zhang, Han Zhang, Kang Dong, Yuruo Hu, Yinan Wang, Zicheng Zhao
Cancer classification is pivotal for precision oncology, yet traditional methods struggle with the molecular heterogeneity of tumors. Our study introduces a self-attention based Conv1D machine learning network designed for panel capture sequencing data, which is more commonly used in clinical settings. Combining clinical capture sequencing data and The Cancer Genome Atlas data, we achieved an overall classification accuracy of over 90%, with precision rates reaching 100% for cervical and gastric cancers. Additionally, recall rates were highest at 95.79% for gastric cancer and lowest at 77.46% for cervical cancer, demonstrating robust performance across various cancer types. The model identified key genes such as C3orf36, JHY, and TASP1, showing significant differences in mutation counts across cancers. High-impact gene enrichment analysis highlighted critical pathways like acute myeloid leukemia and adipocytokine signaling. This approach not only significantly improves the precision of cancer classification, demonstrating the potential for clinical application, but also enhances our understanding of cancer biology.
肿瘤分类是精确肿瘤学的关键,但传统的方法与肿瘤的分子异质性作斗争。我们的研究引入了一种基于自关注的Conv1D机器学习网络,该网络专为面板捕获测序数据而设计,该网络更常用于临床环境。结合临床捕获测序数据和The Cancer Genome Atlas数据,我们实现了90%以上的总体分类准确率,其中宫颈癌和胃癌的准确率达到100%。此外,胃癌的召回率最高,为95.79%,宫颈癌的召回率最低,为77.46%,在各种癌症类型中表现出强劲的表现。该模型确定了C3orf36、JHY和TASP1等关键基因,显示出不同癌症之间突变数量的显著差异。高影响基因富集分析强调了关键途径,如急性髓系白血病和脂肪细胞因子信号。该方法不仅显著提高了肿瘤分类的精度,显示了临床应用的潜力,而且增强了我们对癌症生物学的认识。
{"title":"Enhancing cancer classification accuracy with a self-attention network using panel capture sequencing data.","authors":"Yi Jia, Chan Zhang, Han Zhang, Kang Dong, Yuruo Hu, Yinan Wang, Zicheng Zhao","doi":"10.1093/bib/bbag120","DOIUrl":"10.1093/bib/bbag120","url":null,"abstract":"<p><p>Cancer classification is pivotal for precision oncology, yet traditional methods struggle with the molecular heterogeneity of tumors. Our study introduces a self-attention based Conv1D machine learning network designed for panel capture sequencing data, which is more commonly used in clinical settings. Combining clinical capture sequencing data and The Cancer Genome Atlas data, we achieved an overall classification accuracy of over 90%, with precision rates reaching 100% for cervical and gastric cancers. Additionally, recall rates were highest at 95.79% for gastric cancer and lowest at 77.46% for cervical cancer, demonstrating robust performance across various cancer types. The model identified key genes such as C3orf36, JHY, and TASP1, showing significant differences in mutation counts across cancers. High-impact gene enrichment analysis highlighted critical pathways like acute myeloid leukemia and adipocytokine signaling. This approach not only significantly improves the precision of cancer classification, demonstrating the potential for clinical application, but also enhances our understanding of cancer biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13006975/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147497746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatially variable genes (SVGs) are essential for elucidating tissue organization within spatially resolved transcriptomics. While a number of computational methods have been developed for SVG identification, their reliance on algorithm-specific assumptions, such as predefined kernel functions or spatial neighborhood graphs, often results in substantial variability in sensitivity and inflated false discovery rates (FDRs) across heterogeneous datasets. To address this challenge, we here develop Castl, an ensemble-based framework for SVG identification that integrates multiple detection methods through statistically designed aggregation modules. Comprehensive evaluations on both simulated and real-world data demonstrate that Castl consistently identifies biologically meaningful spatial expression patterns, mitigates method-specific biases and effectively controls FDRs across various biological contexts, resolutions, and spatial technologies. This flexible, assumption-free framework offers a robust and standardized foundation for spatially informed feature discovery in complex biological systems.
{"title":"Castl: robust identification of spatially variable genes in spatial transcriptomics via an ensemble-based framework.","authors":"Yiyi Yu, Jiyuan Yang, Ping-An He, Xiaoqi Zheng","doi":"10.1093/bib/bbag074","DOIUrl":"10.1093/bib/bbag074","url":null,"abstract":"<p><p>Spatially variable genes (SVGs) are essential for elucidating tissue organization within spatially resolved transcriptomics. While a number of computational methods have been developed for SVG identification, their reliance on algorithm-specific assumptions, such as predefined kernel functions or spatial neighborhood graphs, often results in substantial variability in sensitivity and inflated false discovery rates (FDRs) across heterogeneous datasets. To address this challenge, we here develop Castl, an ensemble-based framework for SVG identification that integrates multiple detection methods through statistically designed aggregation modules. Comprehensive evaluations on both simulated and real-world data demonstrate that Castl consistently identifies biologically meaningful spatial expression patterns, mitigates method-specific biases and effectively controls FDRs across various biological contexts, resolutions, and spatial technologies. This flexible, assumption-free framework offers a robust and standardized foundation for spatially informed feature discovery in complex biological systems.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12963980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147364150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for elucidating cell heterogeneity and gene expression. Identifying and discovering cell types through cell clustering is a crucial step in analyzing scRNA-seq data. However, the high-dimensionality nature and frequent dropout events of the data raise great challenges for cell clustering. Here, we propose a novel contrastive clustering framework called scSCCNIA (Similarity-matrix-based Contrastive Clustering with Neighbor Information Aggregation), for the accurate identification of cell clusters from scRNA-seq data. scSCCNIA adopts a Laplacian filter to conduct neighbor information aggregation, constructs different graph views by using special un-shared parameters Siamese encoders for data augmentation, and learns the latent low-dimensional embedding representations via similarity-matrix-based contrastive learning. Comparative analyses of multiple scRNA-seq datasets from different platforms and with varying cell numbers demonstrate that scSCCNIA outperforms existing methods in terms of cell clustering and marker gene identification. Furthermore, scSCCNIA reveals the heterogeneity and functional specificity of various cell types through Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. Overall, scSCCNIA is an effective algorithm for learning latent features from scRNA-seq data, enhancing cell type identification accuracy and facilitating downstream analyses of scRNA-seq data.
{"title":"scSCCNIA: similarity matrix based contrastive clustering with neighbor information aggregation for single-cell RNA sequencing data.","authors":"Jing Wang, Junfeng Xia, Yansen Su, Chun-Hou Zheng","doi":"10.1093/bib/bbag094","DOIUrl":"10.1093/bib/bbag094","url":null,"abstract":"<p><p>The development of single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for elucidating cell heterogeneity and gene expression. Identifying and discovering cell types through cell clustering is a crucial step in analyzing scRNA-seq data. However, the high-dimensionality nature and frequent dropout events of the data raise great challenges for cell clustering. Here, we propose a novel contrastive clustering framework called scSCCNIA (Similarity-matrix-based Contrastive Clustering with Neighbor Information Aggregation), for the accurate identification of cell clusters from scRNA-seq data. scSCCNIA adopts a Laplacian filter to conduct neighbor information aggregation, constructs different graph views by using special un-shared parameters Siamese encoders for data augmentation, and learns the latent low-dimensional embedding representations via similarity-matrix-based contrastive learning. Comparative analyses of multiple scRNA-seq datasets from different platforms and with varying cell numbers demonstrate that scSCCNIA outperforms existing methods in terms of cell clustering and marker gene identification. Furthermore, scSCCNIA reveals the heterogeneity and functional specificity of various cell types through Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. Overall, scSCCNIA is an effective algorithm for learning latent features from scRNA-seq data, enhancing cell type identification accuracy and facilitating downstream analyses of scRNA-seq data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12962064/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147364179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G protein-coupled receptors (GPCRs) are among the most important drug targets, and peptide therapeutics are rapidly emerging. However, accurate prediction of peptide-GPCR interactions (PepGI) remains challenging due to the scarcity of high-quality data and the poor generalization of existing drug-target interaction (DTI) models, which are largely trained on small molecule data. Here, we introduce a progressive fine-tuning framework with a dynamic parameter selection strategy that adaptively selects critical fine-tuning parameters using Fisher information. Our method begins with pretraining on a large small molecule-GPCR dataset, followed by intermediate fine-tuning on peptide-target data to alleviate the representation mismatch across heterogeneous ligand modalities. Finally, the task-specific fine-tuning is performed on the low-resource PepGI scenario. Extensive experiments show that our approach significantly outperforms baselines across multiple evaluation metrics, and exhibits robust generalization under few-shot and practical cold-start settings. Overall, this work offers an effective solution for low-resource peptide-GPCR prediction and presents a transferable framework for cross-structure DTI modeling.
{"title":"A progressive fine-tuning framework with dynamic parameter selection for low-resource peptide-GPCR interaction prediction.","authors":"Mingqing Liu, Jinhui Xu, Ji Liu","doi":"10.1093/bib/bbag116","DOIUrl":"10.1093/bib/bbag116","url":null,"abstract":"<p><p>G protein-coupled receptors (GPCRs) are among the most important drug targets, and peptide therapeutics are rapidly emerging. However, accurate prediction of peptide-GPCR interactions (PepGI) remains challenging due to the scarcity of high-quality data and the poor generalization of existing drug-target interaction (DTI) models, which are largely trained on small molecule data. Here, we introduce a progressive fine-tuning framework with a dynamic parameter selection strategy that adaptively selects critical fine-tuning parameters using Fisher information. Our method begins with pretraining on a large small molecule-GPCR dataset, followed by intermediate fine-tuning on peptide-target data to alleviate the representation mismatch across heterogeneous ligand modalities. Finally, the task-specific fine-tuning is performed on the low-resource PepGI scenario. Extensive experiments show that our approach significantly outperforms baselines across multiple evaluation metrics, and exhibits robust generalization under few-shot and practical cold-start settings. Overall, this work offers an effective solution for low-resource peptide-GPCR prediction and presents a transferable framework for cross-structure DTI modeling.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12991051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147466888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Publisher's Note: Addendum to Volume 26, Issue Supplement 1, December 2025, International Conference on Genome Informatics ISCB-Asia 2025 Abstract Book.","authors":"","doi":"10.1093/bib/bbag026","DOIUrl":"10.1093/bib/bbag026","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12972659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147389551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}