首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
Dynamic-GLEP: a dynamics-informed deep learning framework for ligand efficacy prediction in representative Class A GPCRs. 动态- glep:一个动态信息深度学习框架,用于代表性a类gpcr的配体功效预测。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag049
Zhiyi Chen, Yongxin Hao, Yuhong Su, Hans Ågren, Mingan Chen, Zhehuan Fan, Duanhua Cao, Jiacheng Xiong, Wei Zhang, Jin Liu, Xutong Li, Mingyue Zheng, Xi Cheng, Dingyan Wang, Dan Teng

G protein-coupled receptors (GPCRs) represent the largest membrane protein family and remain central targets in drug discovery. Ligand efficacy reflects the ability to modulate receptor conformational states and extends beyond binding affinity to underpin functional selectivity. However, most computational approaches still emphasize affinity prediction, with limited capacity to capture the conformational dynamics driving efficacy. Here, we introduce Dynamic-GLEP, a structure- and mechanism-aware framework that integrates molecular dynamics (MD)-derived conformational ensembles with transfer learning on equivariant graph neural networks. By constructing multi-conformation receptor-ligand complexes and fine-tuning the EquiScore model, Dynamic-GLEP identifies conformation-dependent interaction features to distinguish agonists from nonagonists. Applied to the 5-HT1A receptor, the framework achieved an area under the curve (AUC) of 0.74 in cross-validation and 0.71 on an external Food and Drug Administration (FDA)-related dataset. Comparative analyses showed that Holo-based models are advantageous for scaffold optimization, whereas Apo-derived ensembles provided greater adaptability to chemically diverse ligands. Furthermore, extension to the adenosine A2A receptor yielded high performance (AUC > 0.85), underscoring the method's robustness and transferability under data-scarce conditions. Collectively, these results highlight Dynamic-GLEP as a reliable and interpretable platform for ligand efficacy prediction in Class A GPCRs, with broad potential to support virtual screening, candidate prioritization, and mechanism-driven drug design.

G蛋白偶联受体(gpcr)是最大的膜蛋白家族,是药物发现的中心靶点。配体功效反映了调节受体构象状态的能力,并延伸到结合亲和力之外,以支持功能选择性。然而,大多数计算方法仍然强调亲和预测,与有限的能力捕捉构象动力学驱动效能。在这里,我们介绍了Dynamic-GLEP,这是一个结构和机制感知框架,将分子动力学(MD)衍生的构象集成与等变图神经网络上的迁移学习集成在一起。通过构建多构象受体配体复合物和微调EquiScore模型,Dynamic-GLEP识别构象依赖的相互作用特征,以区分激动剂和非激动剂。应用于5-HT1A受体,该框架在交叉验证中的曲线下面积(AUC)为0.74,在外部食品和药物管理局(FDA)相关数据集上的AUC为0.71。对比分析表明,基于全息的模型有利于支架优化,而载子衍生的集成体对化学上不同的配体具有更大的适应性。此外,扩展到腺苷A2A受体获得了高性能(AUC > 0.85),强调了该方法在数据稀缺条件下的鲁棒性和可移植性。总的来说,这些结果突出了Dynamic-GLEP作为a类gpcr中配体功效预测的可靠且可解释的平台,具有支持虚拟筛选,候选优先排序和机制驱动的药物设计的广泛潜力。
{"title":"Dynamic-GLEP: a dynamics-informed deep learning framework for ligand efficacy prediction in representative Class A GPCRs.","authors":"Zhiyi Chen, Yongxin Hao, Yuhong Su, Hans Ågren, Mingan Chen, Zhehuan Fan, Duanhua Cao, Jiacheng Xiong, Wei Zhang, Jin Liu, Xutong Li, Mingyue Zheng, Xi Cheng, Dingyan Wang, Dan Teng","doi":"10.1093/bib/bbag049","DOIUrl":"https://doi.org/10.1093/bib/bbag049","url":null,"abstract":"<p><p>G protein-coupled receptors (GPCRs) represent the largest membrane protein family and remain central targets in drug discovery. Ligand efficacy reflects the ability to modulate receptor conformational states and extends beyond binding affinity to underpin functional selectivity. However, most computational approaches still emphasize affinity prediction, with limited capacity to capture the conformational dynamics driving efficacy. Here, we introduce Dynamic-GLEP, a structure- and mechanism-aware framework that integrates molecular dynamics (MD)-derived conformational ensembles with transfer learning on equivariant graph neural networks. By constructing multi-conformation receptor-ligand complexes and fine-tuning the EquiScore model, Dynamic-GLEP identifies conformation-dependent interaction features to distinguish agonists from nonagonists. Applied to the 5-HT1A receptor, the framework achieved an area under the curve (AUC) of 0.74 in cross-validation and 0.71 on an external Food and Drug Administration (FDA)-related dataset. Comparative analyses showed that Holo-based models are advantageous for scaffold optimization, whereas Apo-derived ensembles provided greater adaptability to chemically diverse ligands. Furthermore, extension to the adenosine A2A receptor yielded high performance (AUC > 0.85), underscoring the method's robustness and transferability under data-scarce conditions. Collectively, these results highlight Dynamic-GLEP as a reliable and interpretable platform for ligand efficacy prediction in Class A GPCRs, with broad potential to support virtual screening, candidate prioritization, and mechanism-driven drug design.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146177725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling. GPCRact:通过变构通信模型预测配体诱导的GPCR活性的分层框架。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf719
Hyojin Son, Gwan-Su Yi

Accurate prediction of ligand-induced activity for G-protein-coupled receptors (GPCRs) is a cornerstone of drug discovery, yet it is challenged by the need to model allosteric communication-the long-range signaling linking ligand binding to distal conformational changes. Prevailing sequence-based models often fail to capture these three-dimensional dynamics, a limitation frequently masked by averaged performance on simpler Class A targets. To address this, we introduce GPCRact, a novel framework that models the biophysical principles of allosteric modulation in GPCR activation. It first constructs a high-resolution, three-dimensional structure-aware graph from the heavy-atom coordinates of functionally critical residues at binding and allosteric sites. A dual attention architecture then captures the activation process: cross-attention encodes the initial ligand-protein interaction at the binding site, whereas self-attention learns the subsequent intra-protein signal propagation. This hierarchical architecture is built upon an E(n)-Equivariant Graph Neural Network (EGNN) to explicitly model conformational consequences of ligand binding, and is further refined with a tailored loss function and inference logic to mitigate error propagation. Underpinned by GPCRactDB, a comprehensive database we constructed for this study, GPCRact not only achieves state-of-the-art performance but also demonstrates robustly superior accuracy on a curated benchmark of allosterically complex receptors where existing models systematically underperform. Crucially, analysis of the learned attention weights confirms that the model identifies biologically validated allosteric pathways, offering a significant step toward resolving the black box nature of previous methods. Thus, GPCRact provides a more accurate, interpretable, and mechanistically-grounded solution to a long-standing challenge, paving the way for effective structure-guided drug discovery.

准确预测配体诱导的g蛋白偶联受体(gpcr)的活性是药物发现的基石,但它受到变构通信模型的挑战,变构通信是连接配体结合和远端构象变化的远程信号。主流的基于序列的模型常常不能捕捉到这些三维动态,这一限制常常被更简单的a类目标的平均性能所掩盖。为了解决这个问题,我们引入了GPCRact,这是一个新的框架,模拟了GPCR激活中变构调节的生物物理原理。它首先从结合位点和变构位点的功能关键残基的重原子坐标构建了一个高分辨率的三维结构感知图。双注意结构捕获了激活过程:交叉注意编码结合位点的初始配体-蛋白质相互作用,而自注意学习随后的蛋白质内信号传播。这种分层结构建立在E(n)-等变图神经网络(EGNN)的基础上,以明确地模拟配体结合的构象后果,并通过定制的损失函数和推理逻辑进一步改进,以减轻错误传播。在GPCRactDB(我们为本研究构建的一个综合数据库)的支持下,GPCRact不仅实现了最先进的性能,而且在现有模型系统表现不佳的变构复杂受体的精心基准上显示出强大的优越准确性。至关重要的是,对学习到的注意力权重的分析证实了该模型识别了生物学上有效的变构途径,为解决以前方法的黑箱性质提供了重要的一步。因此,GPCRact为长期存在的挑战提供了更准确、可解释和机械基础的解决方案,为有效的结构导向药物发现铺平了道路。
{"title":"GPCRact: a hierarchical framework for predicting ligand-induced GPCR activity via allosteric communication modeling.","authors":"Hyojin Son, Gwan-Su Yi","doi":"10.1093/bib/bbaf719","DOIUrl":"10.1093/bib/bbaf719","url":null,"abstract":"<p><p>Accurate prediction of ligand-induced activity for G-protein-coupled receptors (GPCRs) is a cornerstone of drug discovery, yet it is challenged by the need to model allosteric communication-the long-range signaling linking ligand binding to distal conformational changes. Prevailing sequence-based models often fail to capture these three-dimensional dynamics, a limitation frequently masked by averaged performance on simpler Class A targets. To address this, we introduce GPCRact, a novel framework that models the biophysical principles of allosteric modulation in GPCR activation. It first constructs a high-resolution, three-dimensional structure-aware graph from the heavy-atom coordinates of functionally critical residues at binding and allosteric sites. A dual attention architecture then captures the activation process: cross-attention encodes the initial ligand-protein interaction at the binding site, whereas self-attention learns the subsequent intra-protein signal propagation. This hierarchical architecture is built upon an E(n)-Equivariant Graph Neural Network (EGNN) to explicitly model conformational consequences of ligand binding, and is further refined with a tailored loss function and inference logic to mitigate error propagation. Underpinned by GPCRactDB, a comprehensive database we constructed for this study, GPCRact not only achieves state-of-the-art performance but also demonstrates robustly superior accuracy on a curated benchmark of allosterically complex receptors where existing models systematically underperform. Crucially, analysis of the learned attention weights confirms that the model identifies biologically validated allosteric pathways, offering a significant step toward resolving the black box nature of previous methods. Thus, GPCRact provides a more accurate, interpretable, and mechanistically-grounded solution to a long-standing challenge, paving the way for effective structure-guided drug discovery.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805254/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review of molecular representation learning foundation models. 分子表征学习基础模型的系统综述。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf703
Bosheng Song, Jiayi Zhang, Ying Liu, Yuansheng Liu, Jing Jiang, Sisi Yuan, Xia Zhen, Yiping Liu

Molecular representation learning (MRL) is afoundation in leveraging computational methods for drug discovery, enabling the transformation of molecular structure and properties into numerical vectors. These vectors serve as input for machine learning models and facilitate the prediction and analysis of molecular attributes, functions, and reactions. The advent of foundation models has introduced both new opportunities and challenges to MRL. These models have improved generalizability and migration in scarce data. Through pretraining and fine-tuning, foundation models can be adapted to various domains. Their robust encoding and generative abilities also allow the transformation of molecular data into more expressive forms. This paper provides a detailed review of current mainstream molecular descriptors and datasets, focusing primarily on the representation of small molecules while excluding larger molecules such as proteins and peptides. It classifies foundation models into two primary categories based on the form of input: unimodal-based and multimodal-based models. For each category, representative models are identified and their advantages and disadvantages evaluated. Moreover, we systematically summarize four core pretraining strategies for MRL foundation models, analyzing their task designs, applicable scenarios, and impacts on downstream performance. In addition, the application of molecular representation foundation models in drug discovery and development is discussed, together with the current status of model interpretability. The paper concludes with insights into the future directions of MRL foundation models.

分子表示学习(MRL)是利用计算方法进行药物发现的基础,能够将分子结构和性质转换为数值向量。这些载体作为机器学习模型的输入,促进了分子属性、功能和反应的预测和分析。基础模型的出现给MRL带来了新的机遇和挑战。这些模型提高了在稀缺数据中的泛化和迁移能力。通过预训练和微调,基础模型可以适应不同的领域。它们强大的编码和生成能力也允许将分子数据转换为更具表现力的形式。本文提供了当前主流分子描述符和数据集的详细回顾,主要集中在小分子的表示,而不包括大分子,如蛋白质和肽。它根据输入形式将基础模型分为两大类:基于单模态的模型和基于多模态的模型。对于每个类别,确定了具有代表性的模型,并评估了它们的优缺点。此外,我们系统地总结了MRL基础模型的四种核心预训练策略,分析了它们的任务设计、适用场景以及对下游性能的影响。此外,还讨论了分子表示基础模型在药物发现和开发中的应用,以及模型可解释性的现状。最后,对MRL基础模型的未来发展方向进行了展望。
{"title":"A systematic review of molecular representation learning foundation models.","authors":"Bosheng Song, Jiayi Zhang, Ying Liu, Yuansheng Liu, Jing Jiang, Sisi Yuan, Xia Zhen, Yiping Liu","doi":"10.1093/bib/bbaf703","DOIUrl":"10.1093/bib/bbaf703","url":null,"abstract":"<p><p>Molecular representation learning (MRL) is afoundation in leveraging computational methods for drug discovery, enabling the transformation of molecular structure and properties into numerical vectors. These vectors serve as input for machine learning models and facilitate the prediction and analysis of molecular attributes, functions, and reactions. The advent of foundation models has introduced both new opportunities and challenges to MRL. These models have improved generalizability and migration in scarce data. Through pretraining and fine-tuning, foundation models can be adapted to various domains. Their robust encoding and generative abilities also allow the transformation of molecular data into more expressive forms. This paper provides a detailed review of current mainstream molecular descriptors and datasets, focusing primarily on the representation of small molecules while excluding larger molecules such as proteins and peptides. It classifies foundation models into two primary categories based on the form of input: unimodal-based and multimodal-based models. For each category, representative models are identified and their advantages and disadvantages evaluated. Moreover, we systematically summarize four core pretraining strategies for MRL foundation models, analyzing their task designs, applicable scenarios, and impacts on downstream performance. In addition, the application of molecular representation foundation models in drug discovery and development is discussed, together with the current status of model interpretability. The paper concludes with insights into the future directions of MRL foundation models.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12784970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145932191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepRMSF: a deep learning-based automated approach for predicting atomic-level flexibility in RNA structure. DeepRMSF:一种基于深度学习的自动化方法,用于预测RNA结构的原子水平灵活性。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf720
Chenjie Feng, Xiaowen Sun, Xintao Song, Lei Bao, Weikang Gong, Renmin Han

Understanding RNA conformational dynamics is essential to understand its roles in complex biological processes. While computational methods have revolutionized the prediction of static 3D RNA structures, predicting local flexibility directly from structure remains a significant challenge. We developed DeepRMSF, a deep learning-based method that leverages atomic-level descriptions of RNA to predict vibrational flexibility given a tertiary structure. Trained on MD-derived root-mean-square fluctuations(RMSF), DeepRMSF was benchmarked on 371 nonredundant RNAs, with 311 RNAs used for five-fold cross-validation (PCC = 0.7219-0.7464) and 60 RNAs as an independent test set (PCC = 0.734), ensuring minimal sequence/structural similarity between sets. DeepRMSF predicts the local flexibility of medium-sized RNAs (~75 nucleotides) in ~8.2 s, achieving >3000-fold speed-up over MD simulations while maintaining strong extrapolative accuracy. Rather than replacing MD, DeepRMSF offers a scalable and practical alternative for transcriptome-scale screening of RNA flexibility, facilitating studies on RNA structure-dynamics-function relationships and supporting computational modeling in RNA biology.

理解RNA构象动力学对于理解其在复杂生物过程中的作用至关重要。虽然计算方法已经彻底改变了静态3D RNA结构的预测,但直接从结构预测局部灵活性仍然是一个重大挑战。我们开发了DeepRMSF,这是一种基于深度学习的方法,利用RNA的原子水平描述来预测给定三级结构的振动灵活性。DeepRMSF在md衍生的均方根波动(RMSF)上进行训练,对371个非冗余rna进行基准测试,其中311个rna用于五重交叉验证(PCC = 0.719 -0.7464), 60个rna作为独立测试集(PCC = 0.734),确保集合之间的序列/结构相似性最小。DeepRMSF在8.2秒内预测中等大小rna(~75个核苷酸)的局部灵活性,在保持很强的外推准确性的同时,实现了比MD模拟快3000倍的速度。DeepRMSF不是取代MD,而是为转录组尺度的RNA灵活性筛选提供了可扩展和实用的替代方案,促进了RNA结构-动力学-功能关系的研究,并支持RNA生物学的计算建模。
{"title":"DeepRMSF: a deep learning-based automated approach for predicting atomic-level flexibility in RNA structure.","authors":"Chenjie Feng, Xiaowen Sun, Xintao Song, Lei Bao, Weikang Gong, Renmin Han","doi":"10.1093/bib/bbaf720","DOIUrl":"10.1093/bib/bbaf720","url":null,"abstract":"<p><p>Understanding RNA conformational dynamics is essential to understand its roles in complex biological processes. While computational methods have revolutionized the prediction of static 3D RNA structures, predicting local flexibility directly from structure remains a significant challenge. We developed DeepRMSF, a deep learning-based method that leverages atomic-level descriptions of RNA to predict vibrational flexibility given a tertiary structure. Trained on MD-derived root-mean-square fluctuations(RMSF), DeepRMSF was benchmarked on 371 nonredundant RNAs, with 311 RNAs used for five-fold cross-validation (PCC = 0.7219-0.7464) and 60 RNAs as an independent test set (PCC = 0.734), ensuring minimal sequence/structural similarity between sets. DeepRMSF predicts the local flexibility of medium-sized RNAs (~75 nucleotides) in ~8.2 s, achieving >3000-fold speed-up over MD simulations while maintaining strong extrapolative accuracy. Rather than replacing MD, DeepRMSF offers a scalable and practical alternative for transcriptome-scale screening of RNA flexibility, facilitating studies on RNA structure-dynamics-function relationships and supporting computational modeling in RNA biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12798811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145965339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting protein-carbohydrate binding sites: a deep learning approach integrating protein language model embeddings and structural features. 预测蛋白质-碳水化合物结合位点:整合蛋白质语言模型嵌入和结构特征的深度学习方法。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag008
Md Muhaiminul Islam Nafi, M Saifur Rahman

Protein-carbohydrate interactions play an important role in many biological processes and functions, like inflammation, signal transduction, and cell adhesion. In our work, we will study non-covalent carbohydrate binding sites. In this paper, we aim to build a deep-learning model to predict non-covalent protein-carbohydrate binding sites. We were motivated by the fact that experimental approaches for predicting these sites are expensive. So, computational tools are necessary for identifying these interactions. We explored several sequence-based features as well as structural features. We also leveraged protein language model embeddings. We analyzed different architectures and selected the most suitable deep learning architecture for our finalized prediction model, DeepCPBSite. DeepCPBSite is an ensemble model that combines three separate models with three approaches (random undersampling, weighted oversampling, and class-weighted loss) built on the ResNet+FNN architecture. We made separate datasets from three sources: RCSB, UniProt, and CASP. We also compared the structural features extracted from the structures predicted by AlphaFold and ESMFold in the context of our prediction tasks. We employed three different feature selection techniques and finally did a SHAP (SHapley Additive exPlanations) analysis on the structural features after categorizing the proteins based on their organism information. DeepCPBSite achieved 78.7% balanced accuracy and 59.6% sensitivity on the TS53 set, outperforming the second-best competitor, DeepGlycanSite, by 1.16% and 2.94%, respectively. Additionally, its F1, MCC, and AUPR scores outperformed other state-of-the-art methods, with improvements ranging from 3.77%-47.6%, 3.84%-32.7%, and 8.18%-60.21%, respectively.

蛋白质-碳水化合物相互作用在许多生物过程和功能中发挥重要作用,如炎症、信号转导和细胞粘附。在我们的工作中,我们将研究非共价碳水化合物结合位点。在本文中,我们的目标是建立一个深度学习模型来预测非共价蛋白质-碳水化合物结合位点。我们的动机是预测这些地点的实验方法是昂贵的。因此,计算工具对于识别这些相互作用是必要的。我们探索了一些基于序列的特征以及结构特征。我们还利用了蛋白质语言模型嵌入。我们分析了不同的架构,并为最终的预测模型DeepCPBSite选择了最合适的深度学习架构。DeepCPBSite是一个集成模型,它结合了三个独立的模型和三种方法(随机欠采样、加权过采样和类加权损失),建立在ResNet+FNN架构上。我们从三个来源制作了独立的数据集:RCSB、UniProt和CASP。在我们的预测任务中,我们还比较了从AlphaFold和ESMFold预测的结构中提取的结构特征。我们采用了三种不同的特征选择技术,最后根据蛋白质的生物体信息对其进行分类,并对其结构特征进行SHapley Additive explanation分析。DeepCPBSite在TS53集上实现了78.7%的平衡准确率和59.6%的灵敏度,分别比排名第二的竞争对手DeepGlycanSite高出1.16%和2.94%。此外,它的F1、MCC和AUPR得分都优于其他最先进的方法,分别提高了3.77%-47.6%、3.84%-32.7%和8.18%-60.21%。
{"title":"Predicting protein-carbohydrate binding sites: a deep learning approach integrating protein language model embeddings and structural features.","authors":"Md Muhaiminul Islam Nafi, M Saifur Rahman","doi":"10.1093/bib/bbag008","DOIUrl":"10.1093/bib/bbag008","url":null,"abstract":"<p><p>Protein-carbohydrate interactions play an important role in many biological processes and functions, like inflammation, signal transduction, and cell adhesion. In our work, we will study non-covalent carbohydrate binding sites. In this paper, we aim to build a deep-learning model to predict non-covalent protein-carbohydrate binding sites. We were motivated by the fact that experimental approaches for predicting these sites are expensive. So, computational tools are necessary for identifying these interactions. We explored several sequence-based features as well as structural features. We also leveraged protein language model embeddings. We analyzed different architectures and selected the most suitable deep learning architecture for our finalized prediction model, DeepCPBSite. DeepCPBSite is an ensemble model that combines three separate models with three approaches (random undersampling, weighted oversampling, and class-weighted loss) built on the ResNet+FNN architecture. We made separate datasets from three sources: RCSB, UniProt, and CASP. We also compared the structural features extracted from the structures predicted by AlphaFold and ESMFold in the context of our prediction tasks. We employed three different feature selection techniques and finally did a SHAP (SHapley Additive exPlanations) analysis on the structural features after categorizing the proteins based on their organism information. DeepCPBSite achieved 78.7% balanced accuracy and 59.6% sensitivity on the TS53 set, outperforming the second-best competitor, DeepGlycanSite, by 1.16% and 2.94%, respectively. Additionally, its F1, MCC, and AUPR scores outperformed other state-of-the-art methods, with improvements ranging from 3.77%-47.6%, 3.84%-32.7%, and 8.18%-60.21%, respectively.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853128/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-dataset transcriptomic analyses identify a conserved ENPP2+ macrophage-fibroblast activation axis in hypertrophic cardiomyopathy. 跨数据集转录组学分析发现肥厚性心肌病中保守的ENPP2+巨噬细胞-成纤维细胞激活轴。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag036
Fanyi Huang, Mi Zhou, Yanjia Chen, Sha Hua, Yanxin Han, Yingze Fan, Qingchuan Li, Zhuoyan Sun, Ke Yang, Qiang Zhao, Wei Jin

Hypertrophic cardiomyopathy (HCM) is a condition where approximately 65% of patients exhibit myocardial fibrosis, indicated by late gadolinium enhancement, with the severity and extent of fibrosis being positively correlated with the risk of sudden cardiac death. While fibroblast activation in HCM has been noted in previous studies, the underlying regulatory mechanisms have not been thoroughly explored. In this study, we analyzed the latest single-nucleus sequencing (snRNA-seq) datasets related to HCM caused by the two most common mutations. We also examined the largest existing snRNA-seq and spatial transcriptomics datasets of HCM for external validation. Additionally, we conducted preliminary histopathological and molecular biology experiments to validate our findings and explore potential mechanisms. Our analysis revealed a phenotypic transformation of macrophages in both cases of HCM. These pro-inflammatory macrophages, driven by the high expression of ENPP2, mediated intercellular interactions that influenced fibroblast activation. The resulting increase in lysophosphatidic acid appeared to act as a plausible intermediary. Activated fibroblasts secreted substantial amounts of COL14A1, which is a critical component of myocardial fibrosis. These findings were consistent across different genetic backgrounds, suggesting their universal applicability in most HCM cases. Our study provides valuable insights into the mechanisms underlying myocardial fibrosis in HCM, highlighting the role of macrophage transformation and fibroblast activation. These findings offer potential for the identification of novel diagnostic or prognostic biomarkers and the development of targeted therapies with clinical translational potential.

肥厚性心肌病(HCM)是一种约65%的患者表现为心肌纤维化的疾病,其表现为晚期钆增强,纤维化的严重程度和程度与心源性猝死的风险呈正相关。虽然在以前的研究中已经注意到HCM中的成纤维细胞活化,但尚未彻底探索其潜在的调节机制。在这项研究中,我们分析了与两种最常见突变引起的HCM相关的最新单核测序(snRNA-seq)数据集。我们还检查了现有最大的HCM的snRNA-seq和空间转录组学数据集进行外部验证。此外,我们进行了初步的组织病理学和分子生物学实验来验证我们的发现并探索潜在的机制。我们的分析揭示了两种HCM病例中巨噬细胞的表型转化。在ENPP2高表达的驱动下,这些促炎巨噬细胞介导了影响成纤维细胞活化的细胞间相互作用。由此产生的溶血磷脂酸的增加似乎起到了一种合理的中介作用。活化的成纤维细胞分泌大量COL14A1,这是心肌纤维化的关键成分。这些发现在不同的遗传背景下是一致的,表明它们在大多数HCM病例中的普遍适用性。我们的研究为HCM心肌纤维化的机制提供了有价值的见解,强调了巨噬细胞转化和成纤维细胞激活的作用。这些发现为鉴定新的诊断或预后生物标志物以及开发具有临床转化潜力的靶向治疗提供了潜力。
{"title":"Cross-dataset transcriptomic analyses identify a conserved ENPP2+ macrophage-fibroblast activation axis in hypertrophic cardiomyopathy.","authors":"Fanyi Huang, Mi Zhou, Yanjia Chen, Sha Hua, Yanxin Han, Yingze Fan, Qingchuan Li, Zhuoyan Sun, Ke Yang, Qiang Zhao, Wei Jin","doi":"10.1093/bib/bbag036","DOIUrl":"10.1093/bib/bbag036","url":null,"abstract":"<p><p>Hypertrophic cardiomyopathy (HCM) is a condition where approximately 65% of patients exhibit myocardial fibrosis, indicated by late gadolinium enhancement, with the severity and extent of fibrosis being positively correlated with the risk of sudden cardiac death. While fibroblast activation in HCM has been noted in previous studies, the underlying regulatory mechanisms have not been thoroughly explored. In this study, we analyzed the latest single-nucleus sequencing (snRNA-seq) datasets related to HCM caused by the two most common mutations. We also examined the largest existing snRNA-seq and spatial transcriptomics datasets of HCM for external validation. Additionally, we conducted preliminary histopathological and molecular biology experiments to validate our findings and explore potential mechanisms. Our analysis revealed a phenotypic transformation of macrophages in both cases of HCM. These pro-inflammatory macrophages, driven by the high expression of ENPP2, mediated intercellular interactions that influenced fibroblast activation. The resulting increase in lysophosphatidic acid appeared to act as a plausible intermediary. Activated fibroblasts secreted substantial amounts of COL14A1, which is a critical component of myocardial fibrosis. These findings were consistent across different genetic backgrounds, suggesting their universal applicability in most HCM cases. Our study provides valuable insights into the mechanisms underlying myocardial fibrosis in HCM, highlighting the role of macrophage transformation and fibroblast activation. These findings offer potential for the identification of novel diagnostic or prognostic biomarkers and the development of targeted therapies with clinical translational potential.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12874883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146123805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring potential transcription factors and their regulatory relationships based on asymmetric covariance natural vector encoding method and machine learning algorithms. 基于非对称协方差自然向量编码方法和机器学习算法探索潜在转录因子及其调控关系。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag044
Guoqing Hu, Mengmeng Sang, Hao Wang, Jia Ge, Lin Xu, Stephen S-T Yau

Transcription factors (TFs) orchestrate cellular programs by activating or repressing gene expression in response to diverse stimuli. Although advances in experimental and computational biology have expanded our understanding of TFs, existing prediction methods still struggle to accurately capture TF-target regulatory relationships and determine their directionality (activation versus inhibition). Here, we propose ACNVE-K, an integrative framework combining k-mer decomposition with asymmetric covariance natural vector encoding to convert amino acid sequences into multidimensional feature vectors. Using Leveraging eXtreme Gradient Boosting (XGBoost), Gradient Boosting (GB), and Random Forest (RF) algorithms, we constructed five predictive models for TF identification, target gene inference, and regulatory direction classification. Benchmarking analyses demonstrated that XGBoost achieved the highest predictive performance across human and mouse genomes, particularly with updated genome annotations. The 5-mer configuration provided an optimal balance between feature richness and computational efficiency. Collectively, ACNVE-K offers a robust and interpretable framework for decoding transcriptional regulation, facilitating advances in precision medicine, regulatory genomics, and machine-learning-based gene network reconstruction.

转录因子(TFs)通过激活或抑制基因表达来协调细胞程序,以响应不同的刺激。尽管实验和计算生物学的进步扩大了我们对tf的理解,但现有的预测方法仍然难以准确地捕捉tf靶调控关系并确定它们的方向性(激活与抑制)。本文提出了一种结合k-mer分解和非对称协方差自然向量编码的集成框架ACNVE-K,将氨基酸序列转化为多维特征向量。利用极限梯度增强(XGBoost)、梯度增强(GB)和随机森林(RF)算法,我们构建了用于TF识别、靶基因推断和调控方向分类的五个预测模型。基准测试分析表明,XGBoost在人类和小鼠基因组中实现了最高的预测性能,特别是在更新基因组注释时。5-mer结构提供了特征丰富度和计算效率之间的最佳平衡。总的来说,ACNVE-K为解码转录调控提供了一个强大的可解释框架,促进了精准医学、调控基因组学和基于机器学习的基因网络重建的进步。
{"title":"Exploring potential transcription factors and their regulatory relationships based on asymmetric covariance natural vector encoding method and machine learning algorithms.","authors":"Guoqing Hu, Mengmeng Sang, Hao Wang, Jia Ge, Lin Xu, Stephen S-T Yau","doi":"10.1093/bib/bbag044","DOIUrl":"10.1093/bib/bbag044","url":null,"abstract":"<p><p>Transcription factors (TFs) orchestrate cellular programs by activating or repressing gene expression in response to diverse stimuli. Although advances in experimental and computational biology have expanded our understanding of TFs, existing prediction methods still struggle to accurately capture TF-target regulatory relationships and determine their directionality (activation versus inhibition). Here, we propose ACNVE-K, an integrative framework combining k-mer decomposition with asymmetric covariance natural vector encoding to convert amino acid sequences into multidimensional feature vectors. Using Leveraging eXtreme Gradient Boosting (XGBoost), Gradient Boosting (GB), and Random Forest (RF) algorithms, we constructed five predictive models for TF identification, target gene inference, and regulatory direction classification. Benchmarking analyses demonstrated that XGBoost achieved the highest predictive performance across human and mouse genomes, particularly with updated genome annotations. The 5-mer configuration provided an optimal balance between feature richness and computational efficiency. Collectively, ACNVE-K offers a robust and interpretable framework for decoding transcriptional regulation, facilitating advances in precision medicine, regulatory genomics, and machine-learning-based gene network reconstruction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12885101/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146149137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Turning heterogeneity of statistical epistasis networks to an advantage. 将统计上位网络的异质性转化为优势。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf699
Diane Duroux, Federico Melograna, Héctor Climente-González, Bowen Fan, Andrew Walakira, Edoardo Efrem Gervasoni, Zuqi Li, Damian Roqueiro, Fabio Stella, Kristel Van Steen

Epistasis detection is hindered by multiple challenges, including the proliferation of analytic tools and the diverse methodological choices made in Genome-Wide Association Interaction Studies (GWAIS). These factors often produce inconsistent and only partially overlapping results, with individual methods emphasizing distinct aspects of epistasis. Although comparative evaluations of GWAIS approaches exist, they generally do not identify the factors responsible for methodological discrepancies or assess their implications for biomedical research. Consequently, it remains unclear which features of GWAIS strategies contribute most to these differences and which methods are most appropriate for revealing specific genetic architectures. Here, we present a workflow designed to characterize heterogeneity in GWAIS results and derive practical recommendations systematically. First, we assess non-replicability by comparing single nucleotide polymorphisms-pair rankings and Statistical Epistasis Networks (SENs)-graphs in which nodes represent genetic loci and edges denote epistatic interactions-to identify clusters of protocols with similar outcomes. SENs provide a structured framework for visualizing and comparing variation in epistasis detection, enabling prioritization of interactions recurrently identified across methods. Second, we propose strategies to reduce heterogeneity and enhance robustness, with particular emphasis on interpretability. Notably, we demonstrate that differences among SENs can be informative rather than disadvantageous, as they yield complementary perspectives on disease genetics. Finally, we highlight the benefits of informed SEN aggregation, showing how this approach can strengthen the utility of GWAIS for elucidating biological mechanisms relevant to disease prevention, diagnosis, and management.

上位性检测受到多种挑战的阻碍,包括分析工具的扩散和全基因组关联相互作用研究(GWAIS)中不同的方法选择。这些因素往往产生不一致的,只有部分重叠的结果,个别方法强调不同方面的上位。虽然存在对GWAIS方法的比较评价,但它们通常没有确定导致方法差异的因素或评估其对生物医学研究的影响。因此,目前尚不清楚GWAIS策略的哪些特征对这些差异贡献最大,以及哪些方法最适合揭示特定的遗传结构。在这里,我们提出了一个工作流,旨在表征GWAIS结果的异质性,并系统地得出实用的建议。首先,我们通过比较单核苷酸多态性-对排名和统计上位网络(SENs)-节点代表遗传位点和边缘表示上位相互作用的图-来评估不可复制性,以识别具有相似结果的协议簇。SENs提供了一个结构化的框架,用于可视化和比较上位性检测的变化,从而实现跨方法反复识别的交互优先级。其次,我们提出了减少异质性和增强鲁棒性的策略,特别强调可解释性。值得注意的是,我们证明SENs之间的差异可能是有益的,而不是不利的,因为它们对疾病遗传学产生了互补的观点。最后,我们强调了知情SEN聚合的好处,展示了这种方法如何加强GWAIS在阐明与疾病预防、诊断和管理相关的生物学机制方面的效用。
{"title":"Turning heterogeneity of statistical epistasis networks to an advantage.","authors":"Diane Duroux, Federico Melograna, Héctor Climente-González, Bowen Fan, Andrew Walakira, Edoardo Efrem Gervasoni, Zuqi Li, Damian Roqueiro, Fabio Stella, Kristel Van Steen","doi":"10.1093/bib/bbaf699","DOIUrl":"10.1093/bib/bbaf699","url":null,"abstract":"<p><p>Epistasis detection is hindered by multiple challenges, including the proliferation of analytic tools and the diverse methodological choices made in Genome-Wide Association Interaction Studies (GWAIS). These factors often produce inconsistent and only partially overlapping results, with individual methods emphasizing distinct aspects of epistasis. Although comparative evaluations of GWAIS approaches exist, they generally do not identify the factors responsible for methodological discrepancies or assess their implications for biomedical research. Consequently, it remains unclear which features of GWAIS strategies contribute most to these differences and which methods are most appropriate for revealing specific genetic architectures. Here, we present a workflow designed to characterize heterogeneity in GWAIS results and derive practical recommendations systematically. First, we assess non-replicability by comparing single nucleotide polymorphisms-pair rankings and Statistical Epistasis Networks (SENs)-graphs in which nodes represent genetic loci and edges denote epistatic interactions-to identify clusters of protocols with similar outcomes. SENs provide a structured framework for visualizing and comparing variation in epistasis detection, enabling prioritization of interactions recurrently identified across methods. Second, we propose strategies to reduce heterogeneity and enhance robustness, with particular emphasis on interpretability. Notably, we demonstrate that differences among SENs can be informative rather than disadvantageous, as they yield complementary perspectives on disease genetics. Finally, we highlight the benefits of informed SEN aggregation, showing how this approach can strengthen the utility of GWAIS for elucidating biological mechanisms relevant to disease prevention, diagnosis, and management.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12814973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146003062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiChemoCLAM: a weakly supervised multimodal framework for chemotherapy response prediction. bichemclam:用于化疗反应预测的弱监督多模式框架。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbaf728
Jinglong Gui, Changming Sun, Jia Zhou, Cun Xie, Leyi Wei, Jia Zhao, Xiaofeng Liu, Ran Su

Chemotherapy is an important treatment for cancer patients, but it comes with risks. Therefore, effective chemotherapy response prediction is crucial. While whole slide image provides high-resolution insights into tumour environments, existing weakly supervised learning frameworks struggle to effectively integrate molecular data, such as gene expression, limiting their predictive power in complex chemotherapy response and small-sample scenarios. We present a bimodal chemotherapy response multi-instance learning framework, BiChemoCLAM, a novel multimodal deep learning framework that combines attention-driven multiple instance learning with multimodal compact bilinear pooling for interpretable and data-efficient chemotherapy response prediction. It achieves an Area Under Curve (AUC) of 80.91%, 71.68%, and 75.80% on ovarian serous cystadenocarcinoma, colorectal adenocarcinoma, and bladder urothelial carcinoma cancer datasets, respectively. The experimental results show that BiChemoCLAM is an effective model for predicting response to chemotherapy.

化疗对癌症患者来说是一种重要的治疗方法,但它也有风险。因此,有效的化疗反应预测至关重要。虽然整个幻灯片图像提供了对肿瘤环境的高分辨率洞察,但现有的弱监督学习框架难以有效地整合分子数据,如基因表达,限制了它们在复杂化疗反应和小样本场景中的预测能力。我们提出了一个双峰化疗反应多实例学习框架bichemclam,这是一个新的多模态深度学习框架,将注意力驱动的多实例学习与多模态紧凑双线性池相结合,用于可解释和数据高效的化疗反应预测。在卵巢浆液性囊腺癌、结直肠腺癌和膀胱尿路上皮癌的数据集上,AUC分别为80.91%、71.68%和75.80%。实验结果表明,bichemclam是预测化疗反应的有效模型。
{"title":"BiChemoCLAM: a weakly supervised multimodal framework for chemotherapy response prediction.","authors":"Jinglong Gui, Changming Sun, Jia Zhou, Cun Xie, Leyi Wei, Jia Zhao, Xiaofeng Liu, Ran Su","doi":"10.1093/bib/bbaf728","DOIUrl":"10.1093/bib/bbaf728","url":null,"abstract":"<p><p>Chemotherapy is an important treatment for cancer patients, but it comes with risks. Therefore, effective chemotherapy response prediction is crucial. While whole slide image provides high-resolution insights into tumour environments, existing weakly supervised learning frameworks struggle to effectively integrate molecular data, such as gene expression, limiting their predictive power in complex chemotherapy response and small-sample scenarios. We present a bimodal chemotherapy response multi-instance learning framework, BiChemoCLAM, a novel multimodal deep learning framework that combines attention-driven multiple instance learning with multimodal compact bilinear pooling for interpretable and data-efficient chemotherapy response prediction. It achieves an Area Under Curve (AUC) of 80.91%, 71.68%, and 75.80% on ovarian serous cystadenocarcinoma, colorectal adenocarcinoma, and bladder urothelial carcinoma cancer datasets, respectively. The experimental results show that BiChemoCLAM is an effective model for predicting response to chemotherapy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12834304/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146050292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-platform integration of brain and CSF proteomes reveals biomarker panels for Alzheimer's disease. 脑和脑脊液蛋白质组的多平台整合揭示了阿尔茨海默病的生物标志物面板。
IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2026-01-07 DOI: 10.1093/bib/bbag012
Wei-Yun Tsai, Pieter Giesbertz, Stephan Breimann, Stefan Lichtenthaler, Dmitrij Frishman

Alzheimer's disease (AD) is the leading cause of dementia and represents a progressive, irreversible neurodegenerative disorder. Given the complexity and heterogeneity of AD, which involves numerous interrelated molecular pathways, large-scale proteomics datasets are essential for robust biomarker discovery. Comprehensive proteomic profiling enables the unbiased identification of novel biomarkers across diverse biological processes, thereby increasing the likelihood of finding sensitive and specific candidates for early diagnosis and therapeutic targeting. In this study, we analyzed 28 large-scale proteomics datasets obtained from the AD Knowledge Portal and published studies. The data comprise tandem mass tag, label-free quantification, and proximity extension assay measurements from brain tissue and cerebrospinal fluid. To enhance analytical power, we integrated these proteomic profiles with corresponding clinical information to construct comprehensive feature sets for subsequent machine learning analysis. Using Random Forest and Logistic Regression models, we identified a panel of proteins capable of distinguishing AD patients from healthy controls. Several of these biomarkers have been previously validated in the context of AD, while others represent novel candidates not yet reported as AD-associated. These newly identified biomarkers warrant further experimental validation and hold promise for improving early diagnosis as well as guiding the development of targeted therapies for AD.

阿尔茨海默病(AD)是痴呆症的主要原因,是一种进行性、不可逆的神经退行性疾病。考虑到AD的复杂性和异质性,它涉及许多相互关联的分子途径,大规模的蛋白质组学数据集对于强大的生物标志物发现至关重要。全面的蛋白质组学分析能够在不同的生物过程中公正地鉴定新的生物标志物,从而增加发现早期诊断和治疗靶向的敏感和特异性候选物的可能性。在这项研究中,我们分析了从AD Knowledge Portal和已发表的研究中获得的28个大规模蛋白质组学数据集。数据包括串联质量标签,无标签量化和接近扩展分析测量脑组织和脑脊液。为了提高分析能力,我们将这些蛋白质组学图谱与相应的临床信息相结合,构建全面的特征集,用于后续的机器学习分析。使用随机森林和Logistic回归模型,我们确定了一组能够区分AD患者和健康对照的蛋白质。这些生物标志物中的一些已经在AD的背景下得到了验证,而另一些则代表了尚未报道的AD相关的新候选物。这些新发现的生物标志物值得进一步的实验验证,并有望改善阿尔茨海默病的早期诊断,并指导阿尔茨海默病靶向治疗的发展。
{"title":"Multi-platform integration of brain and CSF proteomes reveals biomarker panels for Alzheimer's disease.","authors":"Wei-Yun Tsai, Pieter Giesbertz, Stephan Breimann, Stefan Lichtenthaler, Dmitrij Frishman","doi":"10.1093/bib/bbag012","DOIUrl":"10.1093/bib/bbag012","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is the leading cause of dementia and represents a progressive, irreversible neurodegenerative disorder. Given the complexity and heterogeneity of AD, which involves numerous interrelated molecular pathways, large-scale proteomics datasets are essential for robust biomarker discovery. Comprehensive proteomic profiling enables the unbiased identification of novel biomarkers across diverse biological processes, thereby increasing the likelihood of finding sensitive and specific candidates for early diagnosis and therapeutic targeting. In this study, we analyzed 28 large-scale proteomics datasets obtained from the AD Knowledge Portal and published studies. The data comprise tandem mass tag, label-free quantification, and proximity extension assay measurements from brain tissue and cerebrospinal fluid. To enhance analytical power, we integrated these proteomic profiles with corresponding clinical information to construct comprehensive feature sets for subsequent machine learning analysis. Using Random Forest and Logistic Regression models, we identified a panel of proteins capable of distinguishing AD patients from healthy controls. Several of these biomarkers have been previously validated in the context of AD, while others represent novel candidates not yet reported as AD-associated. These newly identified biomarkers warrant further experimental validation and hold promise for improving early diagnosis as well as guiding the development of targeted therapies for AD.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"27 1","pages":""},"PeriodicalIF":7.7,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12853123/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146084295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1