首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
Computational framework for therapeutic target discovery via perturbation simulation: application to cystic fibrosis airway disease. 通过微扰模拟发现治疗靶点的计算框架:应用于囊性纤维化气道疾病。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag129
George Sun, Yi-Hui Zhou

Computational methods for therapeutic target discovery face challenges in integrating multi-scale biological data and predicting system-wide therapeutic effects. We present a computational framework that integrates single-cell transcriptomics, weighted gene co-expression network analysis (WGCNA), and computational perturbation simulation to systematically discover novel therapeutic targets. The framework constructs a knowledge graph comprising 29 896 nodes (23 530 genes and 6366 pathways) with 322 136 edges, integrating gene-gene, gene-pathway, and module relationships. Using perturbation simulation algorithms, we systematically explored 265 candidate targets, scoring each based on perturbation magnitude, module response, therapeutic effect, and statistical significance. Applied to single-cell RNA sequencing data from 38 patients (51 415 cells), the framework identified 66 novel therapeutic targets, including nine very high novelty targets. Computational validation demonstrates efficient scalability (knowledge graph construction: <5 min; 265 perturbation simulations: <2 min) and robust performance across different module sizes. This approach represents a novel computational method for systems-level therapeutic target discovery, with generalizable applications to other complex diseases.

治疗靶点发现的计算方法在整合多尺度生物学数据和预测全系统治疗效果方面面临挑战。我们提出了一个计算框架,整合了单细胞转录组学、加权基因共表达网络分析(WGCNA)和计算扰动模拟,以系统地发现新的治疗靶点。该框架构建了包含29 896个节点(23 530个基因和6366条路径)、322 136条边的知识图谱,整合了基因-基因、基因-路径和模块关系。利用微扰模拟算法,我们系统地探索了265个候选靶点,并根据微扰程度、模块反应、治疗效果和统计显著性对每个靶点进行评分。应用于38例患者(51,415个细胞)的单细胞RNA测序数据,该框架确定了66个新的治疗靶点,包括9个非常高的新颖性靶点。计算验证证明了高效的可扩展性(知识图谱构建:
{"title":"Computational framework for therapeutic target discovery via perturbation simulation: application to cystic fibrosis airway disease.","authors":"George Sun, Yi-Hui Zhou","doi":"10.1093/bib/bbag129","DOIUrl":"https://doi.org/10.1093/bib/bbag129","url":null,"abstract":"<p><p>Computational methods for therapeutic target discovery face challenges in integrating multi-scale biological data and predicting system-wide therapeutic effects. We present a computational framework that integrates single-cell transcriptomics, weighted gene co-expression network analysis (WGCNA), and computational perturbation simulation to systematically discover novel therapeutic targets. The framework constructs a knowledge graph comprising 29 896 nodes (23 530 genes and 6366 pathways) with 322 136 edges, integrating gene-gene, gene-pathway, and module relationships. Using perturbation simulation algorithms, we systematically explored 265 candidate targets, scoring each based on perturbation magnitude, module response, therapeutic effect, and statistical significance. Applied to single-cell RNA sequencing data from 38 patients (51 415 cells), the framework identified 66 novel therapeutic targets, including nine very high novelty targets. Computational validation demonstrates efficient scalability (knowledge graph construction: <5 min; 265 perturbation simulations: <2 min) and robust performance across different module sizes. This approach represents a novel computational method for systems-level therapeutic target discovery, with generalizable applications to other complex diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147509795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Could statistical potential models achieve comparable or better performance than deep learning models? 统计潜力模型能否达到与深度学习模型相当或更好的性能?
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag088
Zhihao Wang, Sheng Wang, Jingjing Guo, Yuguang Mu, Xiangdong Liu, Liangzhen Zheng, Weifeng Li

Accurately predicting protein-ligand interactions is vital for structure-based drug discovery. Although deep learning (DL) models have shown strong performance, the potential of traditional statistical potentials under data-limited conditions remains underexplored. Here, we systematically assess several statistical potential models in docking and virtual screening. We find that docking benefits from distance-dependent pairwise atom-atom potentials with clear physical meanings, while screening relies more on orientation-dependent atom-residue potentials that capture local chemical environments. Based on these findings, we propose HybridSP, a hybrid potential combining distance-dependent atom-atom, atom-residue, and orientation-dependent atom-residue terms. An affinity-weighted scheme is applied to correct biases in statistical distributions. On the CASF-2016 benchmark, HybridSP achieves a 91.6% docking success rate and an enrichment factor of 29.35 at the top 1%, rivaling and even surpassing state-of-the-art DL models. Its strong screening ability is further validated on directory of useful decoys-enhanced and directory of useful decoys-adjusted. These results demonstrate that well-designed statistical potentials can achieve high performance and interpretability without complex DL architectures, offering an efficient alternative for scoring function design. The models are available at: https://github.com/zelixirSH/HybridSP.git.

准确预测蛋白质与配体的相互作用对于基于结构的药物发现至关重要。尽管深度学习(DL)模型显示出强大的性能,但传统统计潜力在数据有限条件下的潜力仍未得到充分开发。在此,我们系统地评估了对接和虚拟筛选中的几种统计潜力模型。我们发现,对接受益于具有明确物理意义的距离依赖的原子-原子对偶电位,而筛选更多地依赖于捕获局部化学环境的取向依赖的原子-残馀电位。基于这些发现,我们提出了HybridSP,这是一个结合了距离依赖的原子-原子、原子-残基和方向依赖的原子-残基术语的混合势。采用一种亲和加权方案来校正统计分布中的偏差。在CASF-2016基准测试中,HybridSP实现了91.6%的对接成功率和29.35的富集系数(前1%),与最先进的深度学习模型相媲美,甚至超过了它们。通过增强有用诱饵目录和调整有用诱饵目录,进一步验证了其强大的筛选能力。这些结果表明,设计良好的统计势可以在没有复杂的深度学习架构的情况下实现高性能和可解释性,为评分函数设计提供了一种有效的替代方案。这些模型可在https://github.com/zelixirSH/HybridSP.git上获得。
{"title":"Could statistical potential models achieve comparable or better performance than deep learning models?","authors":"Zhihao Wang, Sheng Wang, Jingjing Guo, Yuguang Mu, Xiangdong Liu, Liangzhen Zheng, Weifeng Li","doi":"10.1093/bib/bbag088","DOIUrl":"10.1093/bib/bbag088","url":null,"abstract":"<p><p>Accurately predicting protein-ligand interactions is vital for structure-based drug discovery. Although deep learning (DL) models have shown strong performance, the potential of traditional statistical potentials under data-limited conditions remains underexplored. Here, we systematically assess several statistical potential models in docking and virtual screening. We find that docking benefits from distance-dependent pairwise atom-atom potentials with clear physical meanings, while screening relies more on orientation-dependent atom-residue potentials that capture local chemical environments. Based on these findings, we propose HybridSP, a hybrid potential combining distance-dependent atom-atom, atom-residue, and orientation-dependent atom-residue terms. An affinity-weighted scheme is applied to correct biases in statistical distributions. On the CASF-2016 benchmark, HybridSP achieves a 91.6% docking success rate and an enrichment factor of 29.35 at the top 1%, rivaling and even surpassing state-of-the-art DL models. Its strong screening ability is further validated on directory of useful decoys-enhanced and directory of useful decoys-adjusted. These results demonstrate that well-designed statistical potentials can achieve high performance and interpretability without complex DL architectures, offering an efficient alternative for scoring function design. The models are available at: https://github.com/zelixirSH/HybridSP.git.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951076/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147324693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ORANGE: a machine learning approach for modeling tissue-specific aging from transcriptomic data. ORANGE:一种从转录组学数据中建模组织特异性衰老的机器学习方法。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag093
Wasif Jalal, Mubasshira Musarrat, Md Abul Hassan Samee, M Sohel Rahman

Despite aging being a fundamental biological process that profoundly influences health and disease, the interplay between tissue-specific aging and mortality remains underexplored. This study applies machine learning on GTEx transcriptomic data to model tissue-specific biological ages across 12 different types of tissues and introduces an age-gap metric to quantify deviations from the chronological age. We use several modeling techniques optimized with three feature selection strategies: Pearson correlation, age-related differentially expressed genes, and tissue-enriched genes (expressed at least four-fold higher in a specific tissue). Among these, Pearson correlation combined with elastic net regression yields the best performance, with models achieving an average root mean squared error of 6.44 years and an R2 of 0.64. To quantify deviations from chronological age relative to the population, we train neural networks to regress predicted ages against chronological ages, and subtract their outputs from the predicted ages to calculate a metric that we call the age-gap. Age-gap statistics reveal significant tissue-specific aging patterns, identifying extreme agers and correlations between extreme aging and mortality. About 20% of subjects are found to exhibit extreme aging in one tissue, while 1% show multi-organ aging. Further analysis reveals that accelerated aging in specific tissues correlates with greater risk of death from illness. These findings greatly emphasize the role of transcriptomics in aging research and its implications for health and longevity.

尽管衰老是一个深刻影响健康和疾病的基本生物学过程,但组织特异性衰老与死亡率之间的相互作用仍未得到充分探讨。本研究将机器学习应用于GTEx转录组学数据,以模拟12种不同类型组织的组织特异性生物年龄,并引入年龄差距度量来量化与实足年龄的偏差。我们使用了几种建模技术,优化了三种特征选择策略:Pearson相关性、年龄相关的差异表达基因和组织富集基因(在特定组织中表达至少高出四倍)。其中,Pearson相关结合弹性网回归表现最好,模型平均均方根误差为6.44年,R2为0.64。为了量化实际年龄相对于人口的偏差,我们训练神经网络将预测年龄与实际年龄进行回归,并从预测年龄中减去它们的输出,以计算我们称之为年龄差距的度量。年龄差距统计揭示了显著的组织特异性衰老模式,确定了极端衰老和极端衰老与死亡率之间的相关性。大约20%的受试者在一个组织中表现出极度衰老,而1%的受试者表现出多器官衰老。进一步的分析表明,特定组织的加速衰老与疾病死亡的风险增加有关。这些发现极大地强调了转录组学在衰老研究中的作用及其对健康和寿命的影响。
{"title":"ORANGE: a machine learning approach for modeling tissue-specific aging from transcriptomic data.","authors":"Wasif Jalal, Mubasshira Musarrat, Md Abul Hassan Samee, M Sohel Rahman","doi":"10.1093/bib/bbag093","DOIUrl":"10.1093/bib/bbag093","url":null,"abstract":"<p><p>Despite aging being a fundamental biological process that profoundly influences health and disease, the interplay between tissue-specific aging and mortality remains underexplored. This study applies machine learning on GTEx transcriptomic data to model tissue-specific biological ages across 12 different types of tissues and introduces an age-gap metric to quantify deviations from the chronological age. We use several modeling techniques optimized with three feature selection strategies: Pearson correlation, age-related differentially expressed genes, and tissue-enriched genes (expressed at least four-fold higher in a specific tissue). Among these, Pearson correlation combined with elastic net regression yields the best performance, with models achieving an average root mean squared error of 6.44 years and an R2 of 0.64. To quantify deviations from chronological age relative to the population, we train neural networks to regress predicted ages against chronological ages, and subtract their outputs from the predicted ages to calculate a metric that we call the age-gap. Age-gap statistics reveal significant tissue-specific aging patterns, identifying extreme agers and correlations between extreme aging and mortality. About 20% of subjects are found to exhibit extreme aging in one tissue, while 1% show multi-organ aging. Further analysis reveals that accelerated aging in specific tissues correlates with greater risk of death from illness. These findings greatly emphasize the role of transcriptomics in aging research and its implications for health and longevity.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12951074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147324766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kun-peng enables scalable and accurate pan-domain metagenomic classification. 鲲鹏实现可扩展和准确的泛域宏基因组分类。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag119
Qiong Chen, Boliang Zhang, Chen Peng, Jiajun Huang, Zhen Liu, Xiaotao Shen, Chao Jiang

Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2-11.2 min with 4.0-35.4 GB peak memory, corresponding to a 54-473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%-94.3% of reads, improving coverage by 20%-60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.

综合泛域宏基因组分类越来越受到构建和查询快速扩展的参考基因组空间的内存和运行时间成本的限制。我们介绍鲲鹏,一个由智能块分区数据库结构和优化的搜索策略驱动的分类分类器,实现超可扩展,内存高效的泛域分析。使用Critical Assessment of Metagenome Interpretation II基准,与Kraken2相比,鲲鹏将数据库构建和查询的内存使用量大幅降低了24倍,并将样本分类速度提高了4.73倍。与Kraken2,离心机,甚至KrakenUniq相比,鲲鹏实现了具有竞争力的准确性和更少的误报,同时在不同的数据集上保持一致的高灵敏度。在对空气、水、土壤和人类相关环境中的586个宏基因组样本的实际评估中,我们使用了由鲲鹏以4.1 GB峰值内存构建的包含204,477个基因组的4.3 TB泛域数据库进行分类。鲲鹏在0.2-11.2分钟内处理每个样本,峰值内存为4.0-35.4 GB,相对于Kraken2,内存使用减少了54-473倍。与Sylph相比,鲲鹏实现了高达46倍的加速,而需要的内存减少了21倍。鲲鹏分类了69.8%-94.3%的reads,比标准Kraken2数据库的62026个基因组的覆盖率提高了20%-60%。这一改进反映了参考覆盖率的扩大,尽管基于k-mer的方法固有的一小部分误报。总体而言,鲲鹏有效地解决了泛域数据库构建和分类中长期存在的内存瓶颈,实现了复杂环境、生态和暴露体测序数据集的快速、可扩展的泛域分类分析。
{"title":"Kun-peng enables scalable and accurate pan-domain metagenomic classification.","authors":"Qiong Chen, Boliang Zhang, Chen Peng, Jiajun Huang, Zhen Liu, Xiaotao Shen, Chao Jiang","doi":"10.1093/bib/bbag119","DOIUrl":"10.1093/bib/bbag119","url":null,"abstract":"<p><p>Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2-11.2 min with 4.0-35.4 GB peak memory, corresponding to a 54-473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%-94.3% of reads, improving coverage by 20%-60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12991049/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147466884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug screening for α-synuclein aggregation inhibitors via multimodal graph neural network. 基于多模态图神经网络的α-突触核蛋白聚集抑制剂药物筛选。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag118
Tingle Gu, Zixu Ran, Wenyin Li, Xudong Guo, Bo Li, Fuyi Li, Cangzhi Jia

The pathological aggregation of α-synuclein (α-syn) constitutes a pivotal hallmark in the progression of neurodegenerative disorders, including Parkinson's disease, underscoring the imperative need for identifying site-specific ligands. This study presents, for the first time, an advanced deep learning framework specifically designed for the prediction of molecular properties associated with α-syn. The framework integrates graph-based contextual attention mechanisms, structural feature aggregation protocols, and dual-channel feature integration, complemented by a composite regularization strategy that synergizes mean squared error minimization, Kullback-Leibler divergence-induced latent space regularization, and L2 norm penalization, thereby delivering outstanding predictive accuracy on the independent test dataset with MSE of 0.1812. Mechanistic insights derived from GNNExplainer analysis and molecular docking studies (PDB: 6A6B) elucidated that aromatic ring systems (benzene ring significance: 0.737) and hydrogen bond donor groups (amino group significance: 0.438) play critical roles in mediating high-affinity ligand-receptor interactions through π-π stacking within the hydrophobic pocket formed by Val82 and Ala89 residues, as well as directed hydrogen bonding involving catalytic residues Ser42 and Lys45. These findings not only enhance the understanding of inhibitor mechanisms but also establish a novel framework for the preliminary screening of small-molecule therapeutics, thereby laying a rigorous groundwork for structure-guided drug optimization and rational molecular design.

α-突触核蛋白(α-syn)的病理聚集是神经退行性疾病(包括帕金森病)进展的关键标志,强调了鉴定位点特异性配体的迫切需要。本研究首次提出了一种先进的深度学习框架,专门用于预测与α-syn相关的分子特性。该框架集成了基于图的上下文注意机制、结构特征聚合协议和双通道特征集成,辅以一种复合正则化策略,该策略协同了均方误差最小化、Kullback-Leibler发散诱导的潜在空间正则化和L2范数惩罚,从而在MSE为0.1812的独立测试数据集上提供了出色的预测精度。gn解释器分析和分子对接研究(PDB: 6A6B)揭示了芳环系统(苯环显著性:0.737)和氢键供体基团(氨基显著性:0.438)通过在由Val82和Ala89残基形成的疏水囊内π-π stacking介导高亲和力配体-受体相互作用,以及催化残基Ser42和Lys45的定向氢键,在介导高亲和力配体-受体相互作用中发挥关键作用。这些发现不仅增强了对抑制剂机制的认识,而且为小分子治疗药物的初步筛选建立了新的框架,从而为结构导向的药物优化和合理的分子设计奠定了严谨的基础。
{"title":"Drug screening for α-synuclein aggregation inhibitors via multimodal graph neural network.","authors":"Tingle Gu, Zixu Ran, Wenyin Li, Xudong Guo, Bo Li, Fuyi Li, Cangzhi Jia","doi":"10.1093/bib/bbag118","DOIUrl":"10.1093/bib/bbag118","url":null,"abstract":"<p><p>The pathological aggregation of α-synuclein (α-syn) constitutes a pivotal hallmark in the progression of neurodegenerative disorders, including Parkinson's disease, underscoring the imperative need for identifying site-specific ligands. This study presents, for the first time, an advanced deep learning framework specifically designed for the prediction of molecular properties associated with α-syn. The framework integrates graph-based contextual attention mechanisms, structural feature aggregation protocols, and dual-channel feature integration, complemented by a composite regularization strategy that synergizes mean squared error minimization, Kullback-Leibler divergence-induced latent space regularization, and L2 norm penalization, thereby delivering outstanding predictive accuracy on the independent test dataset with MSE of 0.1812. Mechanistic insights derived from GNNExplainer analysis and molecular docking studies (PDB: 6A6B) elucidated that aromatic ring systems (benzene ring significance: 0.737) and hydrogen bond donor groups (amino group significance: 0.438) play critical roles in mediating high-affinity ligand-receptor interactions through π-π stacking within the hydrophobic pocket formed by Val82 and Ala89 residues, as well as directed hydrogen bonding involving catalytic residues Ser42 and Lys45. These findings not only enhance the understanding of inhibitor mechanisms but also establish a novel framework for the preliminary screening of small-molecule therapeutics, thereby laying a rigorous groundwork for structure-guided drug optimization and rational molecular design.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13006971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147497677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing cancer classification accuracy with a self-attention network using panel capture sequencing data. 使用面板捕获测序数据的自关注网络提高癌症分类准确性。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag120
Yi Jia, Chan Zhang, Han Zhang, Kang Dong, Yuruo Hu, Yinan Wang, Zicheng Zhao

Cancer classification is pivotal for precision oncology, yet traditional methods struggle with the molecular heterogeneity of tumors. Our study introduces a self-attention based Conv1D machine learning network designed for panel capture sequencing data, which is more commonly used in clinical settings. Combining clinical capture sequencing data and The Cancer Genome Atlas data, we achieved an overall classification accuracy of over 90%, with precision rates reaching 100% for cervical and gastric cancers. Additionally, recall rates were highest at 95.79% for gastric cancer and lowest at 77.46% for cervical cancer, demonstrating robust performance across various cancer types. The model identified key genes such as C3orf36, JHY, and TASP1, showing significant differences in mutation counts across cancers. High-impact gene enrichment analysis highlighted critical pathways like acute myeloid leukemia and adipocytokine signaling. This approach not only significantly improves the precision of cancer classification, demonstrating the potential for clinical application, but also enhances our understanding of cancer biology.

肿瘤分类是精确肿瘤学的关键,但传统的方法与肿瘤的分子异质性作斗争。我们的研究引入了一种基于自关注的Conv1D机器学习网络,该网络专为面板捕获测序数据而设计,该网络更常用于临床环境。结合临床捕获测序数据和The Cancer Genome Atlas数据,我们实现了90%以上的总体分类准确率,其中宫颈癌和胃癌的准确率达到100%。此外,胃癌的召回率最高,为95.79%,宫颈癌的召回率最低,为77.46%,在各种癌症类型中表现出强劲的表现。该模型确定了C3orf36、JHY和TASP1等关键基因,显示出不同癌症之间突变数量的显著差异。高影响基因富集分析强调了关键途径,如急性髓系白血病和脂肪细胞因子信号。该方法不仅显著提高了肿瘤分类的精度,显示了临床应用的潜力,而且增强了我们对癌症生物学的认识。
{"title":"Enhancing cancer classification accuracy with a self-attention network using panel capture sequencing data.","authors":"Yi Jia, Chan Zhang, Han Zhang, Kang Dong, Yuruo Hu, Yinan Wang, Zicheng Zhao","doi":"10.1093/bib/bbag120","DOIUrl":"10.1093/bib/bbag120","url":null,"abstract":"<p><p>Cancer classification is pivotal for precision oncology, yet traditional methods struggle with the molecular heterogeneity of tumors. Our study introduces a self-attention based Conv1D machine learning network designed for panel capture sequencing data, which is more commonly used in clinical settings. Combining clinical capture sequencing data and The Cancer Genome Atlas data, we achieved an overall classification accuracy of over 90%, with precision rates reaching 100% for cervical and gastric cancers. Additionally, recall rates were highest at 95.79% for gastric cancer and lowest at 77.46% for cervical cancer, demonstrating robust performance across various cancer types. The model identified key genes such as C3orf36, JHY, and TASP1, showing significant differences in mutation counts across cancers. High-impact gene enrichment analysis highlighted critical pathways like acute myeloid leukemia and adipocytokine signaling. This approach not only significantly improves the precision of cancer classification, demonstrating the potential for clinical application, but also enhances our understanding of cancer biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13006975/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147497746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Castl: robust identification of spatially variable genes in spatial transcriptomics via an ensemble-based framework. 通过基于集成的框架在空间转录组学中识别空间可变基因。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag074
Yiyi Yu, Jiyuan Yang, Ping-An He, Xiaoqi Zheng

Spatially variable genes (SVGs) are essential for elucidating tissue organization within spatially resolved transcriptomics. While a number of computational methods have been developed for SVG identification, their reliance on algorithm-specific assumptions, such as predefined kernel functions or spatial neighborhood graphs, often results in substantial variability in sensitivity and inflated false discovery rates (FDRs) across heterogeneous datasets. To address this challenge, we here develop Castl, an ensemble-based framework for SVG identification that integrates multiple detection methods through statistically designed aggregation modules. Comprehensive evaluations on both simulated and real-world data demonstrate that Castl consistently identifies biologically meaningful spatial expression patterns, mitigates method-specific biases and effectively controls FDRs across various biological contexts, resolutions, and spatial technologies. This flexible, assumption-free framework offers a robust and standardized foundation for spatially informed feature discovery in complex biological systems.

空间可变基因(SVGs)是在空间分解转录组学中阐明组织组织的必要条件。虽然已经开发了许多用于SVG识别的计算方法,但它们依赖于特定于算法的假设,例如预定义的核函数或空间邻域图,这通常会导致跨异构数据集的灵敏度和虚高的错误发现率(fdr)存在很大差异。为了解决这个问题,我们开发了Castl,这是一个基于集成的SVG识别框架,通过统计设计的聚合模块集成了多种检测方法。对模拟和现实世界数据的综合评估表明,Castl一致地识别出具有生物学意义的空间表达模式,减轻了方法特异性偏差,并有效地控制了不同生物背景、分辨率和空间技术下的fdr。这种灵活的、无假设的框架为复杂生物系统的空间信息特征发现提供了强大和标准化的基础。
{"title":"Castl: robust identification of spatially variable genes in spatial transcriptomics via an ensemble-based framework.","authors":"Yiyi Yu, Jiyuan Yang, Ping-An He, Xiaoqi Zheng","doi":"10.1093/bib/bbag074","DOIUrl":"10.1093/bib/bbag074","url":null,"abstract":"<p><p>Spatially variable genes (SVGs) are essential for elucidating tissue organization within spatially resolved transcriptomics. While a number of computational methods have been developed for SVG identification, their reliance on algorithm-specific assumptions, such as predefined kernel functions or spatial neighborhood graphs, often results in substantial variability in sensitivity and inflated false discovery rates (FDRs) across heterogeneous datasets. To address this challenge, we here develop Castl, an ensemble-based framework for SVG identification that integrates multiple detection methods through statistically designed aggregation modules. Comprehensive evaluations on both simulated and real-world data demonstrate that Castl consistently identifies biologically meaningful spatial expression patterns, mitigates method-specific biases and effectively controls FDRs across various biological contexts, resolutions, and spatial technologies. This flexible, assumption-free framework offers a robust and standardized foundation for spatially informed feature discovery in complex biological systems.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12963980/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147364150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scSCCNIA: similarity matrix based contrastive clustering with neighbor information aggregation for single-cell RNA sequencing data. scSCCNIA:基于相似性矩阵与邻居信息聚合的单细胞RNA测序数据对比聚类。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag094
Jing Wang, Junfeng Xia, Yansen Su, Chun-Hou Zheng

The development of single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for elucidating cell heterogeneity and gene expression. Identifying and discovering cell types through cell clustering is a crucial step in analyzing scRNA-seq data. However, the high-dimensionality nature and frequent dropout events of the data raise great challenges for cell clustering. Here, we propose a novel contrastive clustering framework called scSCCNIA (Similarity-matrix-based Contrastive Clustering with Neighbor Information Aggregation), for the accurate identification of cell clusters from scRNA-seq data. scSCCNIA adopts a Laplacian filter to conduct neighbor information aggregation, constructs different graph views by using special un-shared parameters Siamese encoders for data augmentation, and learns the latent low-dimensional embedding representations via similarity-matrix-based contrastive learning. Comparative analyses of multiple scRNA-seq datasets from different platforms and with varying cell numbers demonstrate that scSCCNIA outperforms existing methods in terms of cell clustering and marker gene identification. Furthermore, scSCCNIA reveals the heterogeneity and functional specificity of various cell types through Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. Overall, scSCCNIA is an effective algorithm for learning latent features from scRNA-seq data, enhancing cell type identification accuracy and facilitating downstream analyses of scRNA-seq data.

单细胞RNA测序(scRNA-seq)技术的发展为阐明细胞异质性和基因表达提供了前所未有的机会。通过细胞聚类鉴定和发现细胞类型是分析scRNA-seq数据的关键步骤。然而,数据的高维性和频繁的脱落事件给细胞聚类带来了很大的挑战。在这里,我们提出了一种新的对比聚类框架,称为scSCCNIA(基于相似性矩阵的对比聚类与邻居信息聚集),用于从scRNA-seq数据中准确识别细胞簇。scSCCNIA采用拉普拉斯滤波器进行邻居信息聚合,采用特殊的非共享参数Siamese编码器构建不同的图视图进行数据增强,并通过基于相似矩阵的对比学习学习潜在的低维嵌入表示。来自不同平台和不同细胞数量的多个scRNA-seq数据集的比较分析表明,scSCCNIA在细胞聚类和标记基因鉴定方面优于现有方法。此外,scSCCNIA通过基因本体术语和京都基因和基因组百科全书富集分析揭示了各种细胞类型的异质性和功能特异性。总的来说,scSCCNIA是一种有效的算法,可以从scRNA-seq数据中学习潜在特征,提高细胞类型鉴定的准确性,促进scRNA-seq数据的下游分析。
{"title":"scSCCNIA: similarity matrix based contrastive clustering with neighbor information aggregation for single-cell RNA sequencing data.","authors":"Jing Wang, Junfeng Xia, Yansen Su, Chun-Hou Zheng","doi":"10.1093/bib/bbag094","DOIUrl":"10.1093/bib/bbag094","url":null,"abstract":"<p><p>The development of single-cell RNA sequencing (scRNA-seq) technology provides unprecedented opportunities for elucidating cell heterogeneity and gene expression. Identifying and discovering cell types through cell clustering is a crucial step in analyzing scRNA-seq data. However, the high-dimensionality nature and frequent dropout events of the data raise great challenges for cell clustering. Here, we propose a novel contrastive clustering framework called scSCCNIA (Similarity-matrix-based Contrastive Clustering with Neighbor Information Aggregation), for the accurate identification of cell clusters from scRNA-seq data. scSCCNIA adopts a Laplacian filter to conduct neighbor information aggregation, constructs different graph views by using special un-shared parameters Siamese encoders for data augmentation, and learns the latent low-dimensional embedding representations via similarity-matrix-based contrastive learning. Comparative analyses of multiple scRNA-seq datasets from different platforms and with varying cell numbers demonstrate that scSCCNIA outperforms existing methods in terms of cell clustering and marker gene identification. Furthermore, scSCCNIA reveals the heterogeneity and functional specificity of various cell types through Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. Overall, scSCCNIA is an effective algorithm for learning latent features from scRNA-seq data, enhancing cell type identification accuracy and facilitating downstream analyses of scRNA-seq data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12962064/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147364179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A progressive fine-tuning framework with dynamic parameter selection for low-resource peptide-GPCR interaction prediction. 基于动态参数选择的渐进式微调框架用于低资源多肽- gpcr相互作用预测。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag116
Mingqing Liu, Jinhui Xu, Ji Liu

G protein-coupled receptors (GPCRs) are among the most important drug targets, and peptide therapeutics are rapidly emerging. However, accurate prediction of peptide-GPCR interactions (PepGI) remains challenging due to the scarcity of high-quality data and the poor generalization of existing drug-target interaction (DTI) models, which are largely trained on small molecule data. Here, we introduce a progressive fine-tuning framework with a dynamic parameter selection strategy that adaptively selects critical fine-tuning parameters using Fisher information. Our method begins with pretraining on a large small molecule-GPCR dataset, followed by intermediate fine-tuning on peptide-target data to alleviate the representation mismatch across heterogeneous ligand modalities. Finally, the task-specific fine-tuning is performed on the low-resource PepGI scenario. Extensive experiments show that our approach significantly outperforms baselines across multiple evaluation metrics, and exhibits robust generalization under few-shot and practical cold-start settings. Overall, this work offers an effective solution for low-resource peptide-GPCR prediction and presents a transferable framework for cross-structure DTI modeling.

G蛋白偶联受体(gpcr)是最重要的药物靶点之一,肽治疗正在迅速兴起。然而,由于缺乏高质量的数据和现有的药物-靶标相互作用(DTI)模型的泛化能力差,肽- gpcr相互作用(PepGI)的准确预测仍然具有挑战性,这些模型主要是在小分子数据上训练的。在这里,我们引入了一个渐进的微调框架,该框架具有动态参数选择策略,该策略使用Fisher信息自适应地选择关键的微调参数。我们的方法首先在一个大的小分子gpcr数据集上进行预训练,然后对肽靶数据进行中间微调,以减轻异质配体模式之间的表示不匹配。最后,在低资源PepGI场景上执行特定于任务的微调。大量的实验表明,我们的方法在多个评估指标上明显优于基线,并且在少量射击和实际冷启动设置下表现出强大的泛化。总的来说,这项工作为低资源肽- gpcr预测提供了有效的解决方案,并为跨结构DTI建模提供了一个可转移的框架。
{"title":"A progressive fine-tuning framework with dynamic parameter selection for low-resource peptide-GPCR interaction prediction.","authors":"Mingqing Liu, Jinhui Xu, Ji Liu","doi":"10.1093/bib/bbag116","DOIUrl":"10.1093/bib/bbag116","url":null,"abstract":"<p><p>G protein-coupled receptors (GPCRs) are among the most important drug targets, and peptide therapeutics are rapidly emerging. However, accurate prediction of peptide-GPCR interactions (PepGI) remains challenging due to the scarcity of high-quality data and the poor generalization of existing drug-target interaction (DTI) models, which are largely trained on small molecule data. Here, we introduce a progressive fine-tuning framework with a dynamic parameter selection strategy that adaptively selects critical fine-tuning parameters using Fisher information. Our method begins with pretraining on a large small molecule-GPCR dataset, followed by intermediate fine-tuning on peptide-target data to alleviate the representation mismatch across heterogeneous ligand modalities. Finally, the task-specific fine-tuning is performed on the low-resource PepGI scenario. Extensive experiments show that our approach significantly outperforms baselines across multiple evaluation metrics, and exhibits robust generalization under few-shot and practical cold-start settings. Overall, this work offers an effective solution for low-resource peptide-GPCR prediction and presents a transferable framework for cross-structure DTI modeling.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12991051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147466888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Publisher's Note: Addendum to Volume 26, Issue Supplement 1, December 2025, International Conference on Genome Informatics ISCB-Asia 2025 Abstract Book. 出版商注:第26卷的附录,发行补充1,2025年12月,基因组信息学国际会议iscb -亚洲2025摘要书。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-03-01 DOI: 10.1093/bib/bbag026
{"title":"Publisher's Note: Addendum to Volume 26, Issue Supplement 1, December 2025, International Conference on Genome Informatics ISCB-Asia 2025 Abstract Book.","authors":"","doi":"10.1093/bib/bbag026","DOIUrl":"10.1093/bib/bbag026","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 2","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12972659/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147389551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1