Pub Date : 2025-09-01Epub Date: 2024-12-16DOI: 10.1007/s12539-024-00676-1
Jinyun Niu, Fangfang Zhu, Donghai Fang, Wenwen Min
The advent of spatially resolved transcriptomics (SRT) has provided critical insights into the spatial context of tissue microenvironments. Spatial clustering is a fundamental aspect of analyzing spatial transcriptomics data. However, spatial clustering methods often suffer from instability caused by the sparsity and high noise in the SRT data. To address this challenge, we propose SpatialCVGAE, a consensus clustering framework designed for SRT data analysis. SpatialCVGAE adopts the expression of high-variable genes from different dimensions along with multiple spatial graphs as inputs to variational graph autoencoders (VGAEs), learning multiple latent representations for clustering. These clustering results are then integrated using a consensus clustering approach, which enhances the model's stability and robustness by combining multiple clustering outcomes. Experiments demonstrate that SpatialCVGAE effectively mitigates the instability typically associated with non-ensemble deep learning methods, significantly improving both the stability and accuracy of the results. Compared to previous non-ensemble methods in representation learning and post-processing, our method fully leverages the diversity of multiple representations to accurately identify spatial domains, showing superior robustness and adaptability. All code and public datasets used in this paper are available at https://github.com/wenwenmin/SpatialCVGAE .
{"title":"SpatialCVGAE: Consensus Clustering Improves Spatial Domain Identification of Spatial Transcriptomics Using VGAE.","authors":"Jinyun Niu, Fangfang Zhu, Donghai Fang, Wenwen Min","doi":"10.1007/s12539-024-00676-1","DOIUrl":"10.1007/s12539-024-00676-1","url":null,"abstract":"<p><p>The advent of spatially resolved transcriptomics (SRT) has provided critical insights into the spatial context of tissue microenvironments. Spatial clustering is a fundamental aspect of analyzing spatial transcriptomics data. However, spatial clustering methods often suffer from instability caused by the sparsity and high noise in the SRT data. To address this challenge, we propose SpatialCVGAE, a consensus clustering framework designed for SRT data analysis. SpatialCVGAE adopts the expression of high-variable genes from different dimensions along with multiple spatial graphs as inputs to variational graph autoencoders (VGAEs), learning multiple latent representations for clustering. These clustering results are then integrated using a consensus clustering approach, which enhances the model's stability and robustness by combining multiple clustering outcomes. Experiments demonstrate that SpatialCVGAE effectively mitigates the instability typically associated with non-ensemble deep learning methods, significantly improving both the stability and accuracy of the results. Compared to previous non-ensemble methods in representation learning and post-processing, our method fully leverages the diversity of multiple representations to accurately identify spatial domains, showing superior robustness and adaptability. All code and public datasets used in this paper are available at https://github.com/wenwenmin/SpatialCVGAE .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"497-518"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142828461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emerging evidence highlights long non-coding RNAs (lncRNAs) as pivotal regulators demonstrating significant linkages with diverse human pathologies through expression dynamics and regulatory cascades. This research endeavors to establish an algorithm for forecasting the associations between lncRNAs and diseases based on multi-kernel learning-driven weighted nuclear norm regularization (MLWNNR). Specifically, our framework first uses a kernel learning algorithm centered on k-nearest neighbors to integrate multi-similarity kernels. Then, we construct a heterogeneous lncRNA-disease associations network utilizing similarity information and confirm lncRNA-disease associations. Finally, we adopt weighted nuclear norm regularization to complete the heterogeneous network to derive the final association prediction score. MLWNNR achieves impressive performance on three datasets and outperforms six representative models in the comparative experiments, which demonstrates its robustness and excellent generalization abilities. Furthermore, in case studies centered on three common human diseases, the majority of the hypothesized connections are corroborated by experimental literature. MLWNNR is a reliable approach for inferring lncRNA-disease associations, according to the experimental results.
{"title":"MLWNNR: LncRNA-Disease Association Prediction with Multi-Kernel Learning-Driven Weighted Nuclear Norm Regularization.","authors":"Guo-Bo Xie, Hao-Jie Xu, Guo-Sheng Gu, Zhi-Yi Lin, Jun-Rui Yu, Rui-Bin Chen","doi":"10.1007/s12539-025-00717-3","DOIUrl":"10.1007/s12539-025-00717-3","url":null,"abstract":"<p><p>Emerging evidence highlights long non-coding RNAs (lncRNAs) as pivotal regulators demonstrating significant linkages with diverse human pathologies through expression dynamics and regulatory cascades. This research endeavors to establish an algorithm for forecasting the associations between lncRNAs and diseases based on multi-kernel learning-driven weighted nuclear norm regularization (MLWNNR). Specifically, our framework first uses a kernel learning algorithm centered on k-nearest neighbors to integrate multi-similarity kernels. Then, we construct a heterogeneous lncRNA-disease associations network utilizing similarity information and confirm lncRNA-disease associations. Finally, we adopt weighted nuclear norm regularization to complete the heterogeneous network to derive the final association prediction score. MLWNNR achieves impressive performance on three datasets and outperforms six representative models in the comparative experiments, which demonstrates its robustness and excellent generalization abilities. Furthermore, in case studies centered on three common human diseases, the majority of the hypothesized connections are corroborated by experimental literature. MLWNNR is a reliable approach for inferring lncRNA-disease associations, according to the experimental results.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"673-690"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144475105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate identification of ncRNA-protein interactions (NPIs) is critical for understanding various cellular activities and biological functions of ncRNAs and proteins. Many sequence- and/or structure- and graph-based computational approaches have been developed to identify NPIs from large-scale ncRNA and protein data in a high-throughput manner. However, many sequence- and/or structure- and graph-based computational approaches often ignore either the topological information in NPIs or the influence of other molecule networks on NPI prediction. In this work, we propose NPI-HGNN, an end-to-end graph neural network (GNN)-based approach for the identification of NPIs from a large heterogeneous network, consisting of the ncRNA-protein interaction network, the ncRNA-ncRNA similarity network, and the protein-protein interaction network. To our knowledge, NPI-HGNN is the first GNN-based predictor that integrates related heterogeneous networks for NPI prediction. Experiments on five benchmarking datasets demonstrate that NPI-HGNN outperformed several state-of-the-art sequence- and/or structure- and graph-based predictors. In addition, we showcased the prediction power of NPI-HGNN by identifying 12 interacting ncRNAs of the pre-mRNA 3' end processing protein, which indicates the effectiveness of the proposed model. The source code of NPI-HGNN is freely available for academic purposes at https://github.com/zhangxin11111/NPI-HGNN .
{"title":"NPI-HGNN: A Heterogeneous Graph Neural Network-Based Approach for Predicting ncRNA-Protein Interactions.","authors":"Xin Zhang, Haofeng Ma, Sizhe Wang, Hao Wu, Yu Jiang, Quanzhong Liu","doi":"10.1007/s12539-025-00689-4","DOIUrl":"10.1007/s12539-025-00689-4","url":null,"abstract":"<p><p>Accurate identification of ncRNA-protein interactions (NPIs) is critical for understanding various cellular activities and biological functions of ncRNAs and proteins. Many sequence- and/or structure- and graph-based computational approaches have been developed to identify NPIs from large-scale ncRNA and protein data in a high-throughput manner. However, many sequence- and/or structure- and graph-based computational approaches often ignore either the topological information in NPIs or the influence of other molecule networks on NPI prediction. In this work, we propose NPI-HGNN, an end-to-end graph neural network (GNN)-based approach for the identification of NPIs from a large heterogeneous network, consisting of the ncRNA-protein interaction network, the ncRNA-ncRNA similarity network, and the protein-protein interaction network. To our knowledge, NPI-HGNN is the first GNN-based predictor that integrates related heterogeneous networks for NPI prediction. Experiments on five benchmarking datasets demonstrate that NPI-HGNN outperformed several state-of-the-art sequence- and/or structure- and graph-based predictors. In addition, we showcased the prediction power of NPI-HGNN by identifying 12 interacting ncRNAs of the pre-mRNA 3' end processing protein, which indicates the effectiveness of the proposed model. The source code of NPI-HGNN is freely available for academic purposes at https://github.com/zhangxin11111/NPI-HGNN .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"649-661"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143467996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-05-17DOI: 10.1007/s12539-025-00701-x
Jiancheng Zhong, Yicheng Luo, Chen Yang, Maoqi Yuan, Shaokai Wang
In top-down proteomics, the accurate identification and characterization of proteoform through mass spectrometry represents a critical objective. As a result, achieving accuracy in identification results is essential. Multiple primary structure alterations in proteins generate a diverse range of proteoforms, resulting in an exponential increase in potential proteoform. Moreover, the absence of a definitive reference set complicates the standardization of results. Therefore, enhancing the accuracy of proteoform characterization continues to be a significant challenge. We introduced a ResNeXt-based deep learning model, PrSMBooster, for rescoring proteoform spectrum matches (PrSM) during proteoform characterization. As an ensemble method, PrSMBooster integrates four machine learning models, logistic regression, XGBoost, decision tree, and support vector machine, as weak learners to obtain PrSM features. The basic and latent features of PrSM are subsequently input into the ResNeXt model for final rescoring. To verify the effect and accuracy of the PrSMBooster model in rescoring proteoform characterization, it was compared with the characterization algorithm TopPIC across 47 independent mass spectrometry datasets from various species. The experimental results indicate that in most mass spectrometry datasets, the number of PrSMs obtained after rescoring with PrSMBooster increases at a false discovery rate (FDR) of 1%. Further analysis of the experimental results confirmed that PrSMBooster improves the accuracy of PrSM scoring, generates more mass spectrometry characterization results, and demonstrates strong generalization ability.
{"title":"ResNeXt-Based Rescoring Model for Proteoform Characterization in Top-Down Mass Spectra.","authors":"Jiancheng Zhong, Yicheng Luo, Chen Yang, Maoqi Yuan, Shaokai Wang","doi":"10.1007/s12539-025-00701-x","DOIUrl":"10.1007/s12539-025-00701-x","url":null,"abstract":"<p><p>In top-down proteomics, the accurate identification and characterization of proteoform through mass spectrometry represents a critical objective. As a result, achieving accuracy in identification results is essential. Multiple primary structure alterations in proteins generate a diverse range of proteoforms, resulting in an exponential increase in potential proteoform. Moreover, the absence of a definitive reference set complicates the standardization of results. Therefore, enhancing the accuracy of proteoform characterization continues to be a significant challenge. We introduced a ResNeXt-based deep learning model, PrSMBooster, for rescoring proteoform spectrum matches (PrSM) during proteoform characterization. As an ensemble method, PrSMBooster integrates four machine learning models, logistic regression, XGBoost, decision tree, and support vector machine, as weak learners to obtain PrSM features. The basic and latent features of PrSM are subsequently input into the ResNeXt model for final rescoring. To verify the effect and accuracy of the PrSMBooster model in rescoring proteoform characterization, it was compared with the characterization algorithm TopPIC across 47 independent mass spectrometry datasets from various species. The experimental results indicate that in most mass spectrometry datasets, the number of PrSMs obtained after rescoring with PrSMBooster increases at a false discovery rate (FDR) of 1%. Further analysis of the experimental results confirmed that PrSMBooster improves the accuracy of PrSM scoring, generates more mass spectrometry characterization results, and demonstrates strong generalization ability.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"634-648"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12289818/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144086199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-01-06DOI: 10.1007/s12539-024-00681-4
Peng-Cheng Zhao, Xue-Xin Wei, Qiong Wang, Hao-Yang Wang, Bing-Xue Du, Jia-Ning Li, Bei Zhu, Hui Yu, Jian-Yu Shi
Metabolism in vivo turns small molecules (e.g., drugs) into metabolites (new molecules), which brings unexpected safety issues in drug development. However, it is costly to determine metabolites by biological assays. Recent computational methods provide new promising approaches by predicting possible metabolites. Rule-based methods utilize predefined reaction-derived rules to infer metabolites. However, they are powerless to new metabolic reaction patterns. In contrast, rule-free methods leverage sequence-to-sequence machine translation to generate metabolites. Nevertheless, they are insufficient to characterize molecule structures, and bear weak interpretability. To address these issues in rule-free methods, this manuscript proposes a novel metabolism type-aware graph generative framework (MTGGF) for molecular metabolite prediction. It contains a two-stage learning process, including a pre-training on a large general chemical reaction dataset, and a fine-tuning on three smaller type-specific metabolic reaction datasets. Its core, an elaborate graph-to-graph generative model, treats both atoms and bonds as bipartite vertices, and molecules as bipartite graphs, such that it can embed rich information of molecule structures and ensure the integrity of generated metabolite structures. The comparison with state-of-the-art methods demonstrates its superiority. Furthermore, the ablation study validates the contributions of its two graph encoding components and its reaction-type-specific fine-tuning models. More importantly, based on interactive attention between a molecule and its metabolites, the case studies on five approved drugs reveal that there exist crucial substructures specific to metabolism types. It is anticipated that this framework can boost the risk evaluation of drug metabolites. The codes are available at https://github.com/zpczaizheli/Metabolite .
{"title":"MTGGF: A Metabolism Type-Aware Graph Generative Model for Molecular Metabolite Prediction.","authors":"Peng-Cheng Zhao, Xue-Xin Wei, Qiong Wang, Hao-Yang Wang, Bing-Xue Du, Jia-Ning Li, Bei Zhu, Hui Yu, Jian-Yu Shi","doi":"10.1007/s12539-024-00681-4","DOIUrl":"10.1007/s12539-024-00681-4","url":null,"abstract":"<p><p>Metabolism in vivo turns small molecules (e.g., drugs) into metabolites (new molecules), which brings unexpected safety issues in drug development. However, it is costly to determine metabolites by biological assays. Recent computational methods provide new promising approaches by predicting possible metabolites. Rule-based methods utilize predefined reaction-derived rules to infer metabolites. However, they are powerless to new metabolic reaction patterns. In contrast, rule-free methods leverage sequence-to-sequence machine translation to generate metabolites. Nevertheless, they are insufficient to characterize molecule structures, and bear weak interpretability. To address these issues in rule-free methods, this manuscript proposes a novel metabolism type-aware graph generative framework (MTGGF) for molecular metabolite prediction. It contains a two-stage learning process, including a pre-training on a large general chemical reaction dataset, and a fine-tuning on three smaller type-specific metabolic reaction datasets. Its core, an elaborate graph-to-graph generative model, treats both atoms and bonds as bipartite vertices, and molecules as bipartite graphs, such that it can embed rich information of molecule structures and ensure the integrity of generated metabolite structures. The comparison with state-of-the-art methods demonstrates its superiority. Furthermore, the ablation study validates the contributions of its two graph encoding components and its reaction-type-specific fine-tuning models. More importantly, based on interactive attention between a molecule and its metabolites, the case studies on five approved drugs reveal that there exist crucial substructures specific to metabolism types. It is anticipated that this framework can boost the risk evaluation of drug metabolites. The codes are available at https://github.com/zpczaizheli/Metabolite .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"528-540"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142931780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the development of single-cell RNA-sequencing (scRNA-seq) technology, scRNA-seq data analysis suffers huge challenges due to large scale, high dimensionality, high noise, and high sparsity. To achieve accurately embedded representation in the large-scale scRNA-seq data, we try to design a novel graph convolutional network with an adaptive aggregation mechanism. Based on the assumption that the aggregation order of different cells would be different, a graph convolutional network with an adaptive aggregation-based dimensionality reduction algorithm for scRNA-seq data is developed, named scAGCN. In scAGCN, a preprocessing consisting of quality control and feature selection is implemented. Then, an approximate nearest neighbor graph is rapidly constructed. Finally, a graph convolutional network with an adaptive aggregation mechanism is constructed, in which the neighborhood selection strategy based on node distribution and similarity boxplots is designed, and the aggregation function is optimized by defining a similarity measurement between neighborhood nodes and the central node. The results show that scAGCN outperforms existing dimensionality reduction methods on 15 real scRNA-seq datasets, especially in 10 large-scale scRNA-seq datasets.
{"title":"ScAGCN: Graph Convolutional Network with Adaptive Aggregation Mechanism for scRNA-seq Data Dimensionality Reduction.","authors":"Xiaoshu Zhu, Liquan Zhao, Fei Teng, Shuang Meng, Miao Xie","doi":"10.1007/s12539-025-00702-w","DOIUrl":"10.1007/s12539-025-00702-w","url":null,"abstract":"<p><p>With the development of single-cell RNA-sequencing (scRNA-seq) technology, scRNA-seq data analysis suffers huge challenges due to large scale, high dimensionality, high noise, and high sparsity. To achieve accurately embedded representation in the large-scale scRNA-seq data, we try to design a novel graph convolutional network with an adaptive aggregation mechanism. Based on the assumption that the aggregation order of different cells would be different, a graph convolutional network with an adaptive aggregation-based dimensionality reduction algorithm for scRNA-seq data is developed, named scAGCN. In scAGCN, a preprocessing consisting of quality control and feature selection is implemented. Then, an approximate nearest neighbor graph is rapidly constructed. Finally, a graph convolutional network with an adaptive aggregation mechanism is constructed, in which the neighborhood selection strategy based on node distribution and similarity boxplots is designed, and the aggregation function is optimized by defining a similarity measurement between neighborhood nodes and the central node. The results show that scAGCN outperforms existing dimensionality reduction methods on 15 real scRNA-seq datasets, especially in 10 large-scale scRNA-seq datasets.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"576-585"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143995198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-30DOI: 10.1007/s12539-025-00763-x
Siqi Bao, Zijian Yang, Zicheng Zhang, Jia Qu, Jie Sun
{"title":"AttResAMD: An Attention-Driven Deep Learning Framework for Expert-Level Automated Classification of Age-Related Macular Degeneration from Fundus Photography.","authors":"Siqi Bao, Zijian Yang, Zicheng Zhang, Jia Qu, Jie Sun","doi":"10.1007/s12539-025-00763-x","DOIUrl":"https://doi.org/10.1007/s12539-025-00763-x","url":null,"abstract":"","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-30DOI: 10.1007/s12539-025-00760-0
Teng Zhang, Lian Liu
{"title":"m<sup>6</sup>ADP-GCNPUAS: m<sup>6</sup>A-Disease Prediction via Graph Convolutional Network and Positive-Unlabeled Learning with Self-Adaptive Sampling.","authors":"Teng Zhang, Lian Liu","doi":"10.1007/s12539-025-00760-0","DOIUrl":"https://doi.org/10.1007/s12539-025-00760-0","url":null,"abstract":"","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144953027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-26DOI: 10.1007/s12539-025-00766-8
Yusen Su, Qingyang Guo, Taigang Liu
Quorum sensing regulates cooperative behaviors in bacteria through the accumulation and detection of signaling molecules. This process plays a crucial role in various biological functions, including biofilm formation, antibiotic production, regulation of virulence factors, and immune modulation. Quorum sensing peptides (QSPs), primarily produced by Gram-positive bacteria, are key components of the quorum sensing mechanism, and their identification is crucial for understanding bacterial regulation. Despite the availability of several QSP prediction tools based on handcrafted features and machine learning techniques, there is still potential for improving their performance and interpretability. In this study, we present IQSPred-PLM, a novel model for predicting QSPs that integrates protein language models (PLMs) with a convolutional neural network (CNN). First, we utilize the pre-trained PLM ESM-2 to encode peptide sequences. Then, feature extraction is performed using a multi-scale residual CNN (MSRes-CNN), with dynamic feature integration through an adaptive weight modulation (AWM) module. Finally, a fully connected network is designed to conduct the classification of QSPs. Evaluated on the benchmark dataset, IQSPred-PLM demonstrated the outstanding predictive performance with accuracy (ACC), Matthews correlation coefficient (MCC), and the area under the receiver operating characteristic (ROC) curve (AUC) of 97.50%, 0.951, and 0.990, respectively. Furthermore, case studies and interpretability analyses confirmed the effectiveness of IQSPred-PLM for the QSP prediction task.
{"title":"IQSPred-PLM: An Interpretable Quorum Sensing Peptides Prediction Model Based on Protein Language Model.","authors":"Yusen Su, Qingyang Guo, Taigang Liu","doi":"10.1007/s12539-025-00766-8","DOIUrl":"https://doi.org/10.1007/s12539-025-00766-8","url":null,"abstract":"<p><p>Quorum sensing regulates cooperative behaviors in bacteria through the accumulation and detection of signaling molecules. This process plays a crucial role in various biological functions, including biofilm formation, antibiotic production, regulation of virulence factors, and immune modulation. Quorum sensing peptides (QSPs), primarily produced by Gram-positive bacteria, are key components of the quorum sensing mechanism, and their identification is crucial for understanding bacterial regulation. Despite the availability of several QSP prediction tools based on handcrafted features and machine learning techniques, there is still potential for improving their performance and interpretability. In this study, we present IQSPred-PLM, a novel model for predicting QSPs that integrates protein language models (PLMs) with a convolutional neural network (CNN). First, we utilize the pre-trained PLM ESM-2 to encode peptide sequences. Then, feature extraction is performed using a multi-scale residual CNN (MSRes-CNN), with dynamic feature integration through an adaptive weight modulation (AWM) module. Finally, a fully connected network is designed to conduct the classification of QSPs. Evaluated on the benchmark dataset, IQSPred-PLM demonstrated the outstanding predictive performance with accuracy (ACC), Matthews correlation coefficient (MCC), and the area under the receiver operating characteristic (ROC) curve (AUC) of 97.50%, 0.951, and 0.990, respectively. Furthermore, case studies and interpretability analyses confirmed the effectiveness of IQSPred-PLM for the QSP prediction task.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144953063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-22DOI: 10.1007/s12539-025-00751-1
Xiaoxin Du, Xue Yang, Bo Wang, Mei Jin, Yiping Wang, Changrong Li, Peilong Wu
Metabolite-disease associations (MDAs) are critical for advancing precision medicine, yet existing computational methods face challenges in data sparsity, noise robustness, and feature representation. We propose GPLCL (graph prompt-enhanced contrastive learning), a novel multi-view graph learning framework integrating adaptive graph prompting and contrastive learning. GPLCL introduces enhanced graph prompt features (GPF +) with attention-based node adaptation, enabling dynamic feature recalibration. Through strategic graph augmentation and self-supervised contrastive optimization, it preserves essential topological invariants while aggregating multi-scale neighborhood patterns via HeteroGraphSAGE. In the fivefold cross-validation, GPLCL achieves AUC 0.9761 and AUPR 0.9729 on dataset 1, which is the highest improvement of 0.55 to 6.37 percentage points over the existing methods; GPLCL still maintains AUC 0.9576 and AUPR 0.9499 on the highly noisy Dataset 2, which proves its excellent performance and robustness. Case studies on type 1 diabetes, obesity, and Parkinson's disease highlighted the model's potential in discovering novel MDAs, underscoring its applicability in advancing metabolomics research and translational medicine. The code is publicly available at https://github.com/yxue9/GPLCL .
{"title":"Adaptive Graph Prompting Meets Contrastive Learning: A Multi-View Framework for Metabolite-Disease Association Prediction.","authors":"Xiaoxin Du, Xue Yang, Bo Wang, Mei Jin, Yiping Wang, Changrong Li, Peilong Wu","doi":"10.1007/s12539-025-00751-1","DOIUrl":"https://doi.org/10.1007/s12539-025-00751-1","url":null,"abstract":"<p><p>Metabolite-disease associations (MDAs) are critical for advancing precision medicine, yet existing computational methods face challenges in data sparsity, noise robustness, and feature representation. We propose GPLCL (graph prompt-enhanced contrastive learning), a novel multi-view graph learning framework integrating adaptive graph prompting and contrastive learning. GPLCL introduces enhanced graph prompt features (GPF +) with attention-based node adaptation, enabling dynamic feature recalibration. Through strategic graph augmentation and self-supervised contrastive optimization, it preserves essential topological invariants while aggregating multi-scale neighborhood patterns via HeteroGraphSAGE. In the fivefold cross-validation, GPLCL achieves AUC 0.9761 and AUPR 0.9729 on dataset 1, which is the highest improvement of 0.55 to 6.37 percentage points over the existing methods; GPLCL still maintains AUC 0.9576 and AUPR 0.9499 on the highly noisy Dataset 2, which proves its excellent performance and robustness. Case studies on type 1 diabetes, obesity, and Parkinson's disease highlighted the model's potential in discovering novel MDAs, underscoring its applicability in advancing metabolomics research and translational medicine. The code is publicly available at https://github.com/yxue9/GPLCL .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}