Interdisciplinary Sciences: Computational Life Sciences最新文献_第9页

SpatialCVGAE: Consensus Clustering Improves Spatial Domain Identification of Spatial Transcriptomics Using VGAE. SpatialCVGAE：共识聚类改进了使用 VGAE 的空间转录组学的空间域识别。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2024-12-16 DOI: 10.1007/s12539-024-00676-1

Jinyun Niu, Fangfang Zhu, Donghai Fang, Wenwen Min

The advent of spatially resolved transcriptomics (SRT) has provided critical insights into the spatial context of tissue microenvironments. Spatial clustering is a fundamental aspect of analyzing spatial transcriptomics data. However, spatial clustering methods often suffer from instability caused by the sparsity and high noise in the SRT data. To address this challenge, we propose SpatialCVGAE, a consensus clustering framework designed for SRT data analysis. SpatialCVGAE adopts the expression of high-variable genes from different dimensions along with multiple spatial graphs as inputs to variational graph autoencoders (VGAEs), learning multiple latent representations for clustering. These clustering results are then integrated using a consensus clustering approach, which enhances the model's stability and robustness by combining multiple clustering outcomes. Experiments demonstrate that SpatialCVGAE effectively mitigates the instability typically associated with non-ensemble deep learning methods, significantly improving both the stability and accuracy of the results. Compared to previous non-ensemble methods in representation learning and post-processing, our method fully leverages the diversity of multiple representations to accurately identify spatial domains, showing superior robustness and adaptability. All code and public datasets used in this paper are available at https://github.com/wenwenmin/SpatialCVGAE .

空间解析转录组学（SRT）的出现为组织微环境的空间背景提供了重要的见解。空间聚类是分析空间转录组学数据的一个基本方面。然而，由于SRT数据的稀疏性和高噪声，空间聚类方法往往存在不稳定性。为了解决这一挑战，我们提出了一个为SRT数据分析设计的共识聚类框架SpatialCVGAE。SpatialCVGAE采用不同维度的高变量基因表达以及多个空间图作为变分图自编码器（VGAEs）的输入，学习多个潜在表征进行聚类。然后使用共识聚类方法整合这些聚类结果，该方法通过组合多个聚类结果来增强模型的稳定性和鲁棒性。实验表明，SpatialCVGAE有效地缓解了非集成深度学习方法的不稳定性，显著提高了结果的稳定性和准确性。与以往在表征学习和后处理方面的非集成方法相比，该方法充分利用了多个表征的多样性来准确识别空间域，具有较强的鲁棒性和适应性。本文中使用的所有代码和公共数据集可在https://github.com/wenwenmin/SpatialCVGAE上获得。

{"title":"SpatialCVGAE: Consensus Clustering Improves Spatial Domain Identification of Spatial Transcriptomics Using VGAE.","authors":"Jinyun Niu, Fangfang Zhu, Donghai Fang, Wenwen Min","doi":"10.1007/s12539-024-00676-1","DOIUrl":"10.1007/s12539-024-00676-1","url":null,"abstract":"The advent of spatially resolved transcriptomics (SRT) has provided critical insights into the spatial context of tissue microenvironments. Spatial clustering is a fundamental aspect of analyzing spatial transcriptomics data. However, spatial clustering methods often suffer from instability caused by the sparsity and high noise in the SRT data. To address this challenge, we propose SpatialCVGAE, a consensus clustering framework designed for SRT data analysis. SpatialCVGAE adopts the expression of high-variable genes from different dimensions along with multiple spatial graphs as inputs to variational graph autoencoders (VGAEs), learning multiple latent representations for clustering. These clustering results are then integrated using a consensus clustering approach, which enhances the model's stability and robustness by combining multiple clustering outcomes. Experiments demonstrate that SpatialCVGAE effectively mitigates the instability typically associated with non-ensemble deep learning methods, significantly improving both the stability and accuracy of the results. Compared to previous non-ensemble methods in representation learning and post-processing, our method fully leverages the diversity of multiple representations to accurately identify spatial domains, showing superior robustness and adaptability. All code and public datasets used in this paper are available at https://github.com/wenwenmin/SpatialCVGAE .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"497-518"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142828461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MLWNNR: LncRNA-Disease Association Prediction with Multi-Kernel Learning-Driven Weighted Nuclear Norm Regularization. MLWNNR：基于多核学习驱动加权核范数正则化的lncrna -疾病关联预测。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2025-06-23 DOI: 10.1007/s12539-025-00717-3

Guo-Bo Xie, Hao-Jie Xu, Guo-Sheng Gu, Zhi-Yi Lin, Jun-Rui Yu, Rui-Bin Chen

Emerging evidence highlights long non-coding RNAs (lncRNAs) as pivotal regulators demonstrating significant linkages with diverse human pathologies through expression dynamics and regulatory cascades. This research endeavors to establish an algorithm for forecasting the associations between lncRNAs and diseases based on multi-kernel learning-driven weighted nuclear norm regularization (MLWNNR). Specifically, our framework first uses a kernel learning algorithm centered on k-nearest neighbors to integrate multi-similarity kernels. Then, we construct a heterogeneous lncRNA-disease associations network utilizing similarity information and confirm lncRNA-disease associations. Finally, we adopt weighted nuclear norm regularization to complete the heterogeneous network to derive the final association prediction score. MLWNNR achieves impressive performance on three datasets and outperforms six representative models in the comparative experiments, which demonstrates its robustness and excellent generalization abilities. Furthermore, in case studies centered on three common human diseases, the majority of the hypothesized connections are corroborated by experimental literature. MLWNNR is a reliable approach for inferring lncRNA-disease associations, according to the experimental results.

新出现的证据表明，长链非编码rna （lncRNAs）是关键的调节因子，通过表达动力学和调控级联与多种人类病理有重要联系。本研究试图建立一种基于多核学习驱动加权核范数正则化（MLWNNR）的lncrna与疾病关联预测算法。具体来说，我们的框架首先使用以k近邻为中心的核学习算法来整合多相似核。然后，利用相似度信息构建异质lncrna -疾病关联网络，确认lncrna -疾病关联。最后，采用加权核范数正则化完成异构网络，得到最终的关联预测分数。MLWNNR在3个数据集上取得了令人印象深刻的性能，并在对比实验中优于6个代表性模型，证明了其鲁棒性和出色的泛化能力。此外，在以三种常见人类疾病为中心的案例研究中，大多数假设的联系都得到了实验文献的证实。根据实验结果，MLWNNR是推断lncrna与疾病关联的可靠方法。

{"title":"MLWNNR: LncRNA-Disease Association Prediction with Multi-Kernel Learning-Driven Weighted Nuclear Norm Regularization.","authors":"Guo-Bo Xie, Hao-Jie Xu, Guo-Sheng Gu, Zhi-Yi Lin, Jun-Rui Yu, Rui-Bin Chen","doi":"10.1007/s12539-025-00717-3","DOIUrl":"10.1007/s12539-025-00717-3","url":null,"abstract":"Emerging evidence highlights long non-coding RNAs (lncRNAs) as pivotal regulators demonstrating significant linkages with diverse human pathologies through expression dynamics and regulatory cascades. This research endeavors to establish an algorithm for forecasting the associations between lncRNAs and diseases based on multi-kernel learning-driven weighted nuclear norm regularization (MLWNNR). Specifically, our framework first uses a kernel learning algorithm centered on k-nearest neighbors to integrate multi-similarity kernels. Then, we construct a heterogeneous lncRNA-disease associations network utilizing similarity information and confirm lncRNA-disease associations. Finally, we adopt weighted nuclear norm regularization to complete the heterogeneous network to derive the final association prediction score. MLWNNR achieves impressive performance on three datasets and outperforms six representative models in the comparative experiments, which demonstrates its robustness and excellent generalization abilities. Furthermore, in case studies centered on three common human diseases, the majority of the hypothesized connections are corroborated by experimental literature. MLWNNR is a reliable approach for inferring lncRNA-disease associations, according to the experimental results.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"673-690"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144475105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NPI-HGNN: A Heterogeneous Graph Neural Network-Based Approach for Predicting ncRNA-Protein Interactions. NPI-HGNN：一种基于异质图神经网络的预测ncrna -蛋白相互作用的方法。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2025-02-21 DOI: 10.1007/s12539-025-00689-4

Xin Zhang, Haofeng Ma, Sizhe Wang, Hao Wu, Yu Jiang, Quanzhong Liu

Accurate identification of ncRNA-protein interactions (NPIs) is critical for understanding various cellular activities and biological functions of ncRNAs and proteins. Many sequence- and/or structure- and graph-based computational approaches have been developed to identify NPIs from large-scale ncRNA and protein data in a high-throughput manner. However, many sequence- and/or structure- and graph-based computational approaches often ignore either the topological information in NPIs or the influence of other molecule networks on NPI prediction. In this work, we propose NPI-HGNN, an end-to-end graph neural network (GNN)-based approach for the identification of NPIs from a large heterogeneous network, consisting of the ncRNA-protein interaction network, the ncRNA-ncRNA similarity network, and the protein-protein interaction network. To our knowledge, NPI-HGNN is the first GNN-based predictor that integrates related heterogeneous networks for NPI prediction. Experiments on five benchmarking datasets demonstrate that NPI-HGNN outperformed several state-of-the-art sequence- and/or structure- and graph-based predictors. In addition, we showcased the prediction power of NPI-HGNN by identifying 12 interacting ncRNAs of the pre-mRNA 3' end processing protein, which indicates the effectiveness of the proposed model. The source code of NPI-HGNN is freely available for academic purposes at https://github.com/zhangxin11111/NPI-HGNN .

准确鉴定ncrna -蛋白相互作用（npi）对于理解ncrna和蛋白的各种细胞活性和生物学功能至关重要。许多基于序列和/或结构和图的计算方法已经被开发出来，以高通量的方式从大规模的ncRNA和蛋白质数据中识别npi。然而，许多基于序列和/或结构和图的计算方法往往忽略了NPI中的拓扑信息或其他分子网络对NPI预测的影响。在这项工作中，我们提出了NPI-HGNN，这是一种基于端到端图神经网络（GNN）的方法，用于从大型异质网络中识别npi，该网络由ncrna -蛋白质相互作用网络、ncRNA-ncRNA相似网络和蛋白质-蛋白质相互作用网络组成。据我们所知，NPI- hgnn是第一个基于gnn的预测器，它集成了相关的异构网络来进行NPI预测。在五个基准数据集上的实验表明，NPI-HGNN优于几种最先进的基于序列和/或结构和图形的预测器。此外，通过鉴定pre-mRNA 3'端加工蛋白的12个相互作用的ncrna，我们展示了NPI-HGNN的预测能力，这表明了所提出模型的有效性。NPI-HGNN的源代码可以在https://github.com/zhangxin11111/NPI-HGNN上免费获得。

{"title":"NPI-HGNN: A Heterogeneous Graph Neural Network-Based Approach for Predicting ncRNA-Protein Interactions.","authors":"Xin Zhang, Haofeng Ma, Sizhe Wang, Hao Wu, Yu Jiang, Quanzhong Liu","doi":"10.1007/s12539-025-00689-4","DOIUrl":"10.1007/s12539-025-00689-4","url":null,"abstract":"Accurate identification of ncRNA-protein interactions (NPIs) is critical for understanding various cellular activities and biological functions of ncRNAs and proteins. Many sequence- and/or structure- and graph-based computational approaches have been developed to identify NPIs from large-scale ncRNA and protein data in a high-throughput manner. However, many sequence- and/or structure- and graph-based computational approaches often ignore either the topological information in NPIs or the influence of other molecule networks on NPI prediction. In this work, we propose NPI-HGNN, an end-to-end graph neural network (GNN)-based approach for the identification of NPIs from a large heterogeneous network, consisting of the ncRNA-protein interaction network, the ncRNA-ncRNA similarity network, and the protein-protein interaction network. To our knowledge, NPI-HGNN is the first GNN-based predictor that integrates related heterogeneous networks for NPI prediction. Experiments on five benchmarking datasets demonstrate that NPI-HGNN outperformed several state-of-the-art sequence- and/or structure- and graph-based predictors. In addition, we showcased the prediction power of NPI-HGNN by identifying 12 interacting ncRNAs of the pre-mRNA 3' end processing protein, which indicates the effectiveness of the proposed model. The source code of NPI-HGNN is freely available for academic purposes at https://github.com/zhangxin11111/NPI-HGNN .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"649-661"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143467996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ResNeXt-Based Rescoring Model for Proteoform Characterization in Top-Down Mass Spectra. 基于resnext的自顶向下质谱中蛋白质形态表征的评分模型。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2025-05-17 DOI: 10.1007/s12539-025-00701-x

Jiancheng Zhong, Yicheng Luo, Chen Yang, Maoqi Yuan, Shaokai Wang

In top-down proteomics, the accurate identification and characterization of proteoform through mass spectrometry represents a critical objective. As a result, achieving accuracy in identification results is essential. Multiple primary structure alterations in proteins generate a diverse range of proteoforms, resulting in an exponential increase in potential proteoform. Moreover, the absence of a definitive reference set complicates the standardization of results. Therefore, enhancing the accuracy of proteoform characterization continues to be a significant challenge. We introduced a ResNeXt-based deep learning model, PrSMBooster, for rescoring proteoform spectrum matches (PrSM) during proteoform characterization. As an ensemble method, PrSMBooster integrates four machine learning models, logistic regression, XGBoost, decision tree, and support vector machine, as weak learners to obtain PrSM features. The basic and latent features of PrSM are subsequently input into the ResNeXt model for final rescoring. To verify the effect and accuracy of the PrSMBooster model in rescoring proteoform characterization, it was compared with the characterization algorithm TopPIC across 47 independent mass spectrometry datasets from various species. The experimental results indicate that in most mass spectrometry datasets, the number of PrSMs obtained after rescoring with PrSMBooster increases at a false discovery rate (FDR) of 1%. Further analysis of the experimental results confirmed that PrSMBooster improves the accuracy of PrSM scoring, generates more mass spectrometry characterization results, and demonstrates strong generalization ability.

在自上而下的蛋白质组学中，通过质谱法准确鉴定和表征蛋白质形态是一个关键的目标。因此，实现识别结果的准确性至关重要。蛋白质的多个一级结构改变产生多种多样的蛋白质形态，导致潜在的蛋白质形态呈指数级增长。此外，缺乏确定的参考集使结果的标准化复杂化。因此，提高异形表征的准确性仍然是一个重大的挑战。我们引入了一个基于resnext的深度学习模型PrSMBooster，用于在蛋白质形态表征过程中重新记录蛋白质形态谱匹配（PrSM）。PrSMBooster是一种集成方法，它将逻辑回归、XGBoost、决策树和支持向量机四种机器学习模型作为弱学习器来获取PrSM特征。PrSM的基本特征和潜在特征随后被输入到ResNeXt模型中进行最终评分。为了验证PrSMBooster模型在重新记录蛋白质形态特征方面的效果和准确性，将其与表征算法TopPIC在来自不同物种的47个独立质谱数据集上进行了比较。实验结果表明，在大多数质谱数据集中，使用PrSMBooster重新评分后获得的PrSMs数量以1%的错误发现率（FDR）增加。进一步分析实验结果证实，PrSMBooster提高了PrSM评分的准确性，生成了更多的质谱表征结果，具有较强的泛化能力。

{"title":"ResNeXt-Based Rescoring Model for Proteoform Characterization in Top-Down Mass Spectra.","authors":"Jiancheng Zhong, Yicheng Luo, Chen Yang, Maoqi Yuan, Shaokai Wang","doi":"10.1007/s12539-025-00701-x","DOIUrl":"10.1007/s12539-025-00701-x","url":null,"abstract":"In top-down proteomics, the accurate identification and characterization of proteoform through mass spectrometry represents a critical objective. As a result, achieving accuracy in identification results is essential. Multiple primary structure alterations in proteins generate a diverse range of proteoforms, resulting in an exponential increase in potential proteoform. Moreover, the absence of a definitive reference set complicates the standardization of results. Therefore, enhancing the accuracy of proteoform characterization continues to be a significant challenge. We introduced a ResNeXt-based deep learning model, PrSMBooster, for rescoring proteoform spectrum matches (PrSM) during proteoform characterization. As an ensemble method, PrSMBooster integrates four machine learning models, logistic regression, XGBoost, decision tree, and support vector machine, as weak learners to obtain PrSM features. The basic and latent features of PrSM are subsequently input into the ResNeXt model for final rescoring. To verify the effect and accuracy of the PrSMBooster model in rescoring proteoform characterization, it was compared with the characterization algorithm TopPIC across 47 independent mass spectrometry datasets from various species. The experimental results indicate that in most mass spectrometry datasets, the number of PrSMs obtained after rescoring with PrSMBooster increases at a false discovery rate (FDR) of 1%. Further analysis of the experimental results confirmed that PrSMBooster improves the accuracy of PrSM scoring, generates more mass spectrometry characterization results, and demonstrates strong generalization ability.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"634-648"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144086199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MTGGF: A Metabolism Type-Aware Graph Generative Model for Molecular Metabolite Prediction. MTGGF：一种代谢类型感知的分子代谢物预测图生成模型。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2025-01-06 DOI: 10.1007/s12539-024-00681-4

Peng-Cheng Zhao, Xue-Xin Wei, Qiong Wang, Hao-Yang Wang, Bing-Xue Du, Jia-Ning Li, Bei Zhu, Hui Yu, Jian-Yu Shi

Metabolism in vivo turns small molecules (e.g., drugs) into metabolites (new molecules), which brings unexpected safety issues in drug development. However, it is costly to determine metabolites by biological assays. Recent computational methods provide new promising approaches by predicting possible metabolites. Rule-based methods utilize predefined reaction-derived rules to infer metabolites. However, they are powerless to new metabolic reaction patterns. In contrast, rule-free methods leverage sequence-to-sequence machine translation to generate metabolites. Nevertheless, they are insufficient to characterize molecule structures, and bear weak interpretability. To address these issues in rule-free methods, this manuscript proposes a novel metabolism type-aware graph generative framework (MTGGF) for molecular metabolite prediction. It contains a two-stage learning process, including a pre-training on a large general chemical reaction dataset, and a fine-tuning on three smaller type-specific metabolic reaction datasets. Its core, an elaborate graph-to-graph generative model, treats both atoms and bonds as bipartite vertices, and molecules as bipartite graphs, such that it can embed rich information of molecule structures and ensure the integrity of generated metabolite structures. The comparison with state-of-the-art methods demonstrates its superiority. Furthermore, the ablation study validates the contributions of its two graph encoding components and its reaction-type-specific fine-tuning models. More importantly, based on interactive attention between a molecule and its metabolites, the case studies on five approved drugs reveal that there exist crucial substructures specific to metabolism types. It is anticipated that this framework can boost the risk evaluation of drug metabolites. The codes are available at https://github.com/zpczaizheli/Metabolite .

体内代谢将小分子（如药物）转化为代谢物（新分子），这给药物开发带来了意想不到的安全性问题。然而，通过生物分析来确定代谢物是昂贵的。最近的计算方法通过预测可能的代谢物提供了新的有前途的方法。基于规则的方法利用预定义的反应衍生规则来推断代谢物。然而，他们对新的代谢反应模式无能为力。相反，无规则方法利用序列到序列的机器翻译来生成代谢物。然而，它们不足以表征分子结构，并且具有较弱的解释性。为了在无规则方法中解决这些问题，本文提出了一种用于分子代谢物预测的新型代谢类型感知图生成框架（MTGGF）。它包含一个两阶段的学习过程，包括对大型一般化学反应数据集的预训练，以及对三个较小类型特定代谢反应数据集的微调。它的核心是一个精细的图对图生成模型，将原子和键都视为二部顶点，将分子视为二部图，从而可以嵌入丰富的分子结构信息，保证生成的代谢物结构的完整性。与最先进的方法比较表明了它的优越性。此外，消融研究验证了其两个图编码组件及其特定反应类型微调模型的贡献。更重要的是，基于分子与其代谢物之间的相互作用关注，对五种获批药物的案例研究表明，存在针对代谢类型的关键亚结构。预计该框架可以促进药物代谢物的风险评估。代码可在https://github.com/zpczaizheli/Metabolite上获得。

{"title":"MTGGF: A Metabolism Type-Aware Graph Generative Model for Molecular Metabolite Prediction.","authors":"Peng-Cheng Zhao, Xue-Xin Wei, Qiong Wang, Hao-Yang Wang, Bing-Xue Du, Jia-Ning Li, Bei Zhu, Hui Yu, Jian-Yu Shi","doi":"10.1007/s12539-024-00681-4","DOIUrl":"10.1007/s12539-024-00681-4","url":null,"abstract":"Metabolism in vivo turns small molecules (e.g., drugs) into metabolites (new molecules), which brings unexpected safety issues in drug development. However, it is costly to determine metabolites by biological assays. Recent computational methods provide new promising approaches by predicting possible metabolites. Rule-based methods utilize predefined reaction-derived rules to infer metabolites. However, they are powerless to new metabolic reaction patterns. In contrast, rule-free methods leverage sequence-to-sequence machine translation to generate metabolites. Nevertheless, they are insufficient to characterize molecule structures, and bear weak interpretability. To address these issues in rule-free methods, this manuscript proposes a novel metabolism type-aware graph generative framework (MTGGF) for molecular metabolite prediction. It contains a two-stage learning process, including a pre-training on a large general chemical reaction dataset, and a fine-tuning on three smaller type-specific metabolic reaction datasets. Its core, an elaborate graph-to-graph generative model, treats both atoms and bonds as bipartite vertices, and molecules as bipartite graphs, such that it can embed rich information of molecule structures and ensure the integrity of generated metabolite structures. The comparison with state-of-the-art methods demonstrates its superiority. Furthermore, the ablation study validates the contributions of its two graph encoding components and its reaction-type-specific fine-tuning models. More importantly, based on interactive attention between a molecule and its metabolites, the case studies on five approved drugs reveal that there exist crucial substructures specific to metabolism types. It is anticipated that this framework can boost the risk evaluation of drug metabolites. The codes are available at https://github.com/zpczaizheli/Metabolite .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"528-540"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142931780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ScAGCN: Graph Convolutional Network with Adaptive Aggregation Mechanism for scRNA-seq Data Dimensionality Reduction. ScAGCN：基于自适应聚合机制的scRNA-seq数据降维图卷积网络。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-09-01 Epub Date: 2025-04-25 DOI: 10.1007/s12539-025-00702-w

Xiaoshu Zhu, Liquan Zhao, Fei Teng, Shuang Meng, Miao Xie

With the development of single-cell RNA-sequencing (scRNA-seq) technology, scRNA-seq data analysis suffers huge challenges due to large scale, high dimensionality, high noise, and high sparsity. To achieve accurately embedded representation in the large-scale scRNA-seq data, we try to design a novel graph convolutional network with an adaptive aggregation mechanism. Based on the assumption that the aggregation order of different cells would be different, a graph convolutional network with an adaptive aggregation-based dimensionality reduction algorithm for scRNA-seq data is developed, named scAGCN. In scAGCN, a preprocessing consisting of quality control and feature selection is implemented. Then, an approximate nearest neighbor graph is rapidly constructed. Finally, a graph convolutional network with an adaptive aggregation mechanism is constructed, in which the neighborhood selection strategy based on node distribution and similarity boxplots is designed, and the aggregation function is optimized by defining a similarity measurement between neighborhood nodes and the central node. The results show that scAGCN outperforms existing dimensionality reduction methods on 15 real scRNA-seq datasets, especially in 10 large-scale scRNA-seq datasets.

随着单细胞rna测序（scRNA-seq）技术的发展，scRNA-seq数据分析因其大规模、高维数、高噪声和高稀疏性而面临巨大挑战。为了在大规模scRNA-seq数据中实现准确的嵌入表示，我们尝试设计一种具有自适应聚合机制的新型图卷积网络。基于不同细胞聚集顺序不同的假设，提出了一种基于自适应聚集的scRNA-seq数据降维算法的图卷积网络，命名为scAGCN。在scAGCN中，实现了由质量控制和特征选择组成的预处理。然后，快速构造近似最近邻图。最后，构建了具有自适应聚合机制的图卷积网络，设计了基于节点分布和相似箱线图的邻域选择策略，并通过定义邻域节点与中心节点之间的相似度度量来优化聚合函数。结果表明，scAGCN在15个真实scRNA-seq数据集上优于现有的降维方法，特别是在10个大规模scRNA-seq数据集上。

{"title":"ScAGCN: Graph Convolutional Network with Adaptive Aggregation Mechanism for scRNA-seq Data Dimensionality Reduction.","authors":"Xiaoshu Zhu, Liquan Zhao, Fei Teng, Shuang Meng, Miao Xie","doi":"10.1007/s12539-025-00702-w","DOIUrl":"10.1007/s12539-025-00702-w","url":null,"abstract":"With the development of single-cell RNA-sequencing (scRNA-seq) technology, scRNA-seq data analysis suffers huge challenges due to large scale, high dimensionality, high noise, and high sparsity. To achieve accurately embedded representation in the large-scale scRNA-seq data, we try to design a novel graph convolutional network with an adaptive aggregation mechanism. Based on the assumption that the aggregation order of different cells would be different, a graph convolutional network with an adaptive aggregation-based dimensionality reduction algorithm for scRNA-seq data is developed, named scAGCN. In scAGCN, a preprocessing consisting of quality control and feature selection is implemented. Then, an approximate nearest neighbor graph is rapidly constructed. Finally, a graph convolutional network with an adaptive aggregation mechanism is constructed, in which the neighborhood selection strategy based on node distribution and similarity boxplots is designed, and the aggregation function is optimized by defining a similarity measurement between neighborhood nodes and the central node. The results show that scAGCN outperforms existing dimensionality reduction methods on 15 real scRNA-seq datasets, especially in 10 large-scale scRNA-seq datasets.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"576-585"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143995198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AttResAMD: An Attention-Driven Deep Learning Framework for Expert-Level Automated Classification of Age-Related Macular Degeneration from Fundus Photography. 一个关注驱动的深度学习框架，用于眼底摄影中年龄相关性黄斑变性的专家级自动分类。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-08-30 DOI: 10.1007/s12539-025-00763-x

Siqi Bao, Zijian Yang, Zicheng Zhang, Jia Qu, Jie Sun

引用次数: 0

m⁶ADP-GCNPUAS: m⁶A-Disease Prediction via Graph Convolutional Network and Positive-Unlabeled Learning with Self-Adaptive Sampling. m6ADP-GCNPUAS：基于图卷积网络和自适应采样的正无标记学习的m6a -疾病预测。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-08-30 DOI: 10.1007/s12539-025-00760-0

Teng Zhang, Lian Liu

引用次数: 0

IQSPred-PLM: An Interpretable Quorum Sensing Peptides Prediction Model Based on Protein Language Model. 基于蛋白质语言模型的可解释群体感应多肽预测模型。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-08-26 DOI: 10.1007/s12539-025-00766-8

Yusen Su, Qingyang Guo, Taigang Liu

Quorum sensing regulates cooperative behaviors in bacteria through the accumulation and detection of signaling molecules. This process plays a crucial role in various biological functions, including biofilm formation, antibiotic production, regulation of virulence factors, and immune modulation. Quorum sensing peptides (QSPs), primarily produced by Gram-positive bacteria, are key components of the quorum sensing mechanism, and their identification is crucial for understanding bacterial regulation. Despite the availability of several QSP prediction tools based on handcrafted features and machine learning techniques, there is still potential for improving their performance and interpretability. In this study, we present IQSPred-PLM, a novel model for predicting QSPs that integrates protein language models (PLMs) with a convolutional neural network (CNN). First, we utilize the pre-trained PLM ESM-2 to encode peptide sequences. Then, feature extraction is performed using a multi-scale residual CNN (MSRes-CNN), with dynamic feature integration through an adaptive weight modulation (AWM) module. Finally, a fully connected network is designed to conduct the classification of QSPs. Evaluated on the benchmark dataset, IQSPred-PLM demonstrated the outstanding predictive performance with accuracy (ACC), Matthews correlation coefficient (MCC), and the area under the receiver operating characteristic (ROC) curve (AUC) of 97.50%, 0.951, and 0.990, respectively. Furthermore, case studies and interpretability analyses confirmed the effectiveness of IQSPred-PLM for the QSP prediction task.

群体感应通过信号分子的积累和检测来调节细菌的合作行为。这一过程在多种生物功能中起着至关重要的作用，包括生物膜的形成、抗生素的产生、毒力因子的调节和免疫调节。群体感应肽（QSPs）主要由革兰氏阳性菌产生，是群体感应机制的关键组成部分，其鉴定对理解细菌调节至关重要。尽管有几种基于手工特征和机器学习技术的QSP预测工具，但它们的性能和可解释性仍有改进的潜力。在这项研究中，我们提出了IQSPred-PLM，这是一种将蛋白质语言模型（PLMs）与卷积神经网络（CNN）相结合的预测qsp的新模型。首先，我们利用预训练的PLM ESM-2编码肽序列。然后，使用多尺度残差CNN （MSRes-CNN）进行特征提取，并通过自适应权调制（AWM）模块进行动态特征集成。最后，设计了一个全连接网络对qsp进行分类。在基准数据集上进行评估，IQSPred-PLM的预测准确率（ACC）、马修斯相关系数（MCC）和受试者工作特征曲线下面积（AUC）分别为97.50%、0.951和0.990，表现出优异的预测性能。此外，案例研究和可解释性分析证实了IQSPred-PLM在QSP预测任务中的有效性。

{"title":"IQSPred-PLM: An Interpretable Quorum Sensing Peptides Prediction Model Based on Protein Language Model.","authors":"Yusen Su, Qingyang Guo, Taigang Liu","doi":"10.1007/s12539-025-00766-8","DOIUrl":"https://doi.org/10.1007/s12539-025-00766-8","url":null,"abstract":"Quorum sensing regulates cooperative behaviors in bacteria through the accumulation and detection of signaling molecules. This process plays a crucial role in various biological functions, including biofilm formation, antibiotic production, regulation of virulence factors, and immune modulation. Quorum sensing peptides (QSPs), primarily produced by Gram-positive bacteria, are key components of the quorum sensing mechanism, and their identification is crucial for understanding bacterial regulation. Despite the availability of several QSP prediction tools based on handcrafted features and machine learning techniques, there is still potential for improving their performance and interpretability. In this study, we present IQSPred-PLM, a novel model for predicting QSPs that integrates protein language models (PLMs) with a convolutional neural network (CNN). First, we utilize the pre-trained PLM ESM-2 to encode peptide sequences. Then, feature extraction is performed using a multi-scale residual CNN (MSRes-CNN), with dynamic feature integration through an adaptive weight modulation (AWM) module. Finally, a fully connected network is designed to conduct the classification of QSPs. Evaluated on the benchmark dataset, IQSPred-PLM demonstrated the outstanding predictive performance with accuracy (ACC), Matthews correlation coefficient (MCC), and the area under the receiver operating characteristic (ROC) curve (AUC) of 97.50%, 0.951, and 0.990, respectively. Furthermore, case studies and interpretability analyses confirmed the effectiveness of IQSPred-PLM for the QSP prediction task.","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144953063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Graph Prompting Meets Contrastive Learning: A Multi-View Framework for Metabolite-Disease Association Prediction. 自适应图形提示与对比学习：代谢物-疾病关联预测的多视图框架。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences

Pub Date : 2025-08-22 DOI: 10.1007/s12539-025-00751-1

Xiaoxin Du, Xue Yang, Bo Wang, Mei Jin, Yiping Wang, Changrong Li, Peilong Wu

Metabolite-disease associations (MDAs) are critical for advancing precision medicine, yet existing computational methods face challenges in data sparsity, noise robustness, and feature representation. We propose GPLCL (graph prompt-enhanced contrastive learning), a novel multi-view graph learning framework integrating adaptive graph prompting and contrastive learning. GPLCL introduces enhanced graph prompt features (GPF +) with attention-based node adaptation, enabling dynamic feature recalibration. Through strategic graph augmentation and self-supervised contrastive optimization, it preserves essential topological invariants while aggregating multi-scale neighborhood patterns via HeteroGraphSAGE. In the fivefold cross-validation, GPLCL achieves AUC 0.9761 and AUPR 0.9729 on dataset 1, which is the highest improvement of 0.55 to 6.37 percentage points over the existing methods; GPLCL still maintains AUC 0.9576 and AUPR 0.9499 on the highly noisy Dataset 2, which proves its excellent performance and robustness. Case studies on type 1 diabetes, obesity, and Parkinson's disease highlighted the model's potential in discovering novel MDAs, underscoring its applicability in advancing metabolomics research and translational medicine. The code is publicly available at https://github.com/yxue9/GPLCL .

代谢物-疾病关联（mda）对于推进精准医疗至关重要，但现有的计算方法在数据稀疏性、噪声鲁棒性和特征表示方面面临挑战。本文提出了一种集自适应图提示和对比学习于一体的多视图图学习框架GPLCL （graph prompt-enhanced contrast learning）。GPLCL引入了增强的图形提示功能（GPF +），具有基于注意力的节点适应，支持动态特征重新校准。该算法通过策略图增广和自监督对比优化，在利用HeteroGraphSAGE聚合多尺度邻域模式的同时，保留了基本的拓扑不变量。在五重交叉验证中，GPLCL在数据集1上的AUC达到0.9761，AUPR达到0.9729，比现有方法提高了0.55 ~ 6.37个百分点；GPLCL在高噪声数据集2上仍然保持着AUC 0.9576和AUPR 0.9499，证明了其优异的性能和鲁棒性。1型糖尿病、肥胖症和帕金森病的案例研究突出了该模型在发现新型mda方面的潜力，强调了其在推进代谢组学研究和转化医学方面的适用性。该代码可在https://github.com/yxue9/GPLCL上公开获得。

{"title":"Adaptive Graph Prompting Meets Contrastive Learning: A Multi-View Framework for Metabolite-Disease Association Prediction.","authors":"Xiaoxin Du, Xue Yang, Bo Wang, Mei Jin, Yiping Wang, Changrong Li, Peilong Wu","doi":"10.1007/s12539-025-00751-1","DOIUrl":"https://doi.org/10.1007/s12539-025-00751-1","url":null,"abstract":"Metabolite-disease associations (MDAs) are critical for advancing precision medicine, yet existing computational methods face challenges in data sparsity, noise robustness, and feature representation. We propose GPLCL (graph prompt-enhanced contrastive learning), a novel multi-view graph learning framework integrating adaptive graph prompting and contrastive learning. GPLCL introduces enhanced graph prompt features (GPF +) with attention-based node adaptation, enabling dynamic feature recalibration. Through strategic graph augmentation and self-supervised contrastive optimization, it preserves essential topological invariants while aggregating multi-scale neighborhood patterns via HeteroGraphSAGE. In the fivefold cross-validation, GPLCL achieves AUC 0.9761 and AUPR 0.9729 on dataset 1, which is the highest improvement of 0.55 to 6.37 percentage points over the existing methods; GPLCL still maintains AUC 0.9576 and AUPR 0.9499 on the highly noisy Dataset 2, which proves its excellent performance and robustness. Case studies on type 1 diabetes, obesity, and Parkinson's disease highlighted the model's potential in discovering novel MDAs, underscoring its applicability in advancing metabolomics research and translational medicine. The code is publicly available at https://github.com/yxue9/GPLCL .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0