Pub Date : 2025-04-03DOI: 10.1007/s12539-025-00700-y
Qiguo Dai, Wuhao Liu, Xianhai Yu, Xiaodong Duan, Ziqiang Liu
Accurately identifying cell types in single-cell RNA sequencing data is critical for understanding cellular differentiation and pathological mechanisms in downstream analysis. As traditional biological approaches are laborious and time-intensive, it is imperative to develop computational biology methods for cell classification. However, it remains a challenge for existing methods to adequately utilize the potential gene expression information within the vast amount of unlabeled cell data, which limits their classification and generalization performance. Therefore, we propose a novel self-supervised graph representation learning framework for single-cell classification, named scSSGC. Specifically, in the pre-training stage of self-supervised learning, multiple K-means clustering tasks conducted on unlabeled cell data are jointly employed for model training, thereby mitigating the issue of limited labeled data. To effectively capture the potential interactions among cells, we introduce a locally augmented graph neural network to enhance the information aggregation capability for nodes with fewer neighbors in the cell graph. A range of benchmark experiments demonstrates that scSSGC outperforms existing state-of-the-art cell classification methods. More importantly, scSSGC provides stable performance when faced with cross-datasets, indicating better generalization ability.
{"title":"Self-Supervised Graph Representation Learning for Single-Cell Classification.","authors":"Qiguo Dai, Wuhao Liu, Xianhai Yu, Xiaodong Duan, Ziqiang Liu","doi":"10.1007/s12539-025-00700-y","DOIUrl":"https://doi.org/10.1007/s12539-025-00700-y","url":null,"abstract":"<p><p>Accurately identifying cell types in single-cell RNA sequencing data is critical for understanding cellular differentiation and pathological mechanisms in downstream analysis. As traditional biological approaches are laborious and time-intensive, it is imperative to develop computational biology methods for cell classification. However, it remains a challenge for existing methods to adequately utilize the potential gene expression information within the vast amount of unlabeled cell data, which limits their classification and generalization performance. Therefore, we propose a novel self-supervised graph representation learning framework for single-cell classification, named scSSGC. Specifically, in the pre-training stage of self-supervised learning, multiple K-means clustering tasks conducted on unlabeled cell data are jointly employed for model training, thereby mitigating the issue of limited labeled data. To effectively capture the potential interactions among cells, we introduce a locally augmented graph neural network to enhance the information aggregation capability for nodes with fewer neighbors in the cell graph. A range of benchmark experiments demonstrates that scSSGC outperforms existing state-of-the-art cell classification methods. More importantly, scSSGC provides stable performance when faced with cross-datasets, indicating better generalization ability.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143780053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identification of drug-target interactions (DTIs) is critical for drug discovery and drug repositioning. However, most DTI methods that extract features from drug molecules and protein entities neglect specific substructure information of pharmacological responses, which leads to poor predictive performance. Moreover, most existing methods are based on molecular graphs or molecular descriptors to obtain abstract representations of molecules, but combining the two feature learning methods for DTI prediction remains unexplored. Therefore, a new ASCS-DTI framework for DTI prediction is proposed, which utilizes a substructure attention mechanism to flexibly capture substructures of compounds at different grain sizes, allowing the important substructure information of each molecule to be learned. Additionally, the framework combines three different molecular fingerprinting information to comprehensively characterize molecular representations. A stacked convolutional coding module processes the sequence information of target proteins in a multi-scale and multi-level view. Finally, multi-modal fusion of molecular graph features and molecular fingerprint features, along with multi-modal information encoding of DTIs, is performed by the feature fusion module. The method outperforms six advanced baseline models on different benchmark datasets: Biosnap, BindingDB, and Human, with a significant improvement in performance, particularly in maintaining strong results across different experimental settings.
{"title":"Sensing Compound Substructures Combined with Molecular Fingerprinting to Predict Drug-Target Interactions.","authors":"Wanhua Huang, Xuecong Tian, Ying Su, Sizhe Zhang, Chen Chen, Cheng Chen","doi":"10.1007/s12539-025-00698-3","DOIUrl":"https://doi.org/10.1007/s12539-025-00698-3","url":null,"abstract":"<p><p>Identification of drug-target interactions (DTIs) is critical for drug discovery and drug repositioning. However, most DTI methods that extract features from drug molecules and protein entities neglect specific substructure information of pharmacological responses, which leads to poor predictive performance. Moreover, most existing methods are based on molecular graphs or molecular descriptors to obtain abstract representations of molecules, but combining the two feature learning methods for DTI prediction remains unexplored. Therefore, a new ASCS-DTI framework for DTI prediction is proposed, which utilizes a substructure attention mechanism to flexibly capture substructures of compounds at different grain sizes, allowing the important substructure information of each molecule to be learned. Additionally, the framework combines three different molecular fingerprinting information to comprehensively characterize molecular representations. A stacked convolutional coding module processes the sequence information of target proteins in a multi-scale and multi-level view. Finally, multi-modal fusion of molecular graph features and molecular fingerprint features, along with multi-modal information encoding of DTIs, is performed by the feature fusion module. The method outperforms six advanced baseline models on different benchmark datasets: Biosnap, BindingDB, and Human, with a significant improvement in performance, particularly in maintaining strong results across different experimental settings.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143772098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1007/s12539-025-00694-7
Yuxiao Wei, Zhebin Tan, Liwei Liu
circRNAs are a type of single-stranded non-coding RNA molecules, and their unique feature is their closed circular structure. The interaction between circRNAs and RNA-binding proteins (RBPs) plays a key role in biological functions and is crucial for studying post-transcriptional regulatory mechanisms. The genome-wide circRNA binding event data obtained by cross-linking immunoprecipitation sequencing technology provides a foundation for constructing efficient computational model prediction methods. However, in existing studies, although machine learning techniques have been applied to predict circRNA-RBP interaction sites, these methods still have room for improvement in accuracy and lack interpretability. We propose CR-deal, which is an interpretable joint deep learning network that predicts the binding sites of circRNA and RBP through genome-wide circRNA data. CR-deal utilizes a graph attention network to unify sequence and structural features into the same view, more effectively utilizing structural features to improve accuracy. It can infer marker genes in the binding site through integrated gradient feature interpretation, thereby inferring functional structural regions in the binding site. We conducted benchmark tests on CR-deal on 37 circRNA datasets and 7 lncRNA datasets, respectively, and obtained the interpretability of CR-deal and discovered functional structural regions through 5 circRNA datasets. We believe that CR-deal can help researchers gain a deeper understanding of the functions and mechanisms of circRNA in living organisms and its critical role in the occurrence and development of diseases. The source code of CR-deal is provided free of charge on https://github.com/liuliwei1980/CR .
{"title":"CR-deal: Explainable Neural Network for circRNA-RBP Binding Site Recognition and Interpretation.","authors":"Yuxiao Wei, Zhebin Tan, Liwei Liu","doi":"10.1007/s12539-025-00694-7","DOIUrl":"https://doi.org/10.1007/s12539-025-00694-7","url":null,"abstract":"<p><p>circRNAs are a type of single-stranded non-coding RNA molecules, and their unique feature is their closed circular structure. The interaction between circRNAs and RNA-binding proteins (RBPs) plays a key role in biological functions and is crucial for studying post-transcriptional regulatory mechanisms. The genome-wide circRNA binding event data obtained by cross-linking immunoprecipitation sequencing technology provides a foundation for constructing efficient computational model prediction methods. However, in existing studies, although machine learning techniques have been applied to predict circRNA-RBP interaction sites, these methods still have room for improvement in accuracy and lack interpretability. We propose CR-deal, which is an interpretable joint deep learning network that predicts the binding sites of circRNA and RBP through genome-wide circRNA data. CR-deal utilizes a graph attention network to unify sequence and structural features into the same view, more effectively utilizing structural features to improve accuracy. It can infer marker genes in the binding site through integrated gradient feature interpretation, thereby inferring functional structural regions in the binding site. We conducted benchmark tests on CR-deal on 37 circRNA datasets and 7 lncRNA datasets, respectively, and obtained the interpretability of CR-deal and discovered functional structural regions through 5 circRNA datasets. We believe that CR-deal can help researchers gain a deeper understanding of the functions and mechanisms of circRNA in living organisms and its critical role in the occurrence and development of diseases. The source code of CR-deal is provided free of charge on https://github.com/liuliwei1980/CR .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143718686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-24DOI: 10.1007/s12539-025-00693-8
Xiaoyi Yu, Donglin Zhu, Hongjie Guo, Changjun Zhou, Mohammed A M Elhassan, Mengzhen Wang
Clear cell renal cell carcinoma (ccRCC) is the most common form of renal cell carcinoma in adults, comprising approximately 80% of cases. The lethality of ccRCC rises significantly at stage III or beyond, emphasizing the need for early detection to enable timely therapeutic interventions. This study introduces a non-invasive and efficient classification method, Domain Adaptive Squeeze-and-Excitation Network (DASNet), for grading ccRCC through Computed Tomography (CT) images using advanced deep learning and machine learning techniques. The dataset is enhanced using MedAugment technology and balanced to improve generalization and classification performance. To mitigate overfitting, renal angiomyolipoma (AML) samples are incorporated, increasing data diversity and model robustness. EfficientNet and RegNet serve as foundational models, leveraging local feature extraction and Squeeze-and-Excitation (SE) attention mechanisms to enhance recognition accuracy across grades. Furthermore, Domain-Adversarial Neural Networks (DANNs) are employed to maintain consistency between source and target domains, bolstering the model's generalization ability. The proposed model achieves a classification accuracy of 97.50%, demonstrating efficacy in early ccRCC grade identification. These findings not only offer valuable clinical insights but also establish a foundation for broader application of deep learning in tumor detection.
{"title":"DASNet: A Convolutional Neural Network with SE Attention Mechanism for ccRCC Tumor Grading.","authors":"Xiaoyi Yu, Donglin Zhu, Hongjie Guo, Changjun Zhou, Mohammed A M Elhassan, Mengzhen Wang","doi":"10.1007/s12539-025-00693-8","DOIUrl":"https://doi.org/10.1007/s12539-025-00693-8","url":null,"abstract":"<p><p>Clear cell renal cell carcinoma (ccRCC) is the most common form of renal cell carcinoma in adults, comprising approximately 80% of cases. The lethality of ccRCC rises significantly at stage III or beyond, emphasizing the need for early detection to enable timely therapeutic interventions. This study introduces a non-invasive and efficient classification method, Domain Adaptive Squeeze-and-Excitation Network (DASNet), for grading ccRCC through Computed Tomography (CT) images using advanced deep learning and machine learning techniques. The dataset is enhanced using MedAugment technology and balanced to improve generalization and classification performance. To mitigate overfitting, renal angiomyolipoma (AML) samples are incorporated, increasing data diversity and model robustness. EfficientNet and RegNet serve as foundational models, leveraging local feature extraction and Squeeze-and-Excitation (SE) attention mechanisms to enhance recognition accuracy across grades. Furthermore, Domain-Adversarial Neural Networks (DANNs) are employed to maintain consistency between source and target domains, bolstering the model's generalization ability. The proposed model achieves a classification accuracy of 97.50%, demonstrating efficacy in early ccRCC grade identification. These findings not only offer valuable clinical insights but also establish a foundation for broader application of deep learning in tumor detection.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143700390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-18DOI: 10.1007/s12539-025-00699-2
Xu Luo, Xinpeng Zhang, Dongqing Su, Honghao Li, Min Zou, Yuqiang Xiong, Lei Yang
As a common malignancy of the lower respiratory tract, non-small cell lung cancer (NSCLC) represents a major oncological challenge globally, characterized by high incidence and mortality rates. Recent research highlights the critical involvement of somatic mutations in the onset and development of NSCLC. Stratification of NSCLC patients based on somatic mutation data could facilitate the identification of patients likely to respond to personalized therapeutic strategies. However, stratification of NSCLC patients using somatic mutation data is challenging due to the sparseness of this data. In this study, based on sparse somatic mutation data from 4581 NSCLC patients from the Memorial Sloan Kettering Cancer Center (MSKCC) database, we systematically evaluate the metabolic pathway activity in NSCLC patients through the application of network propagation algorithm and computational biology algorithms. Based on these metabolic pathways associated with prognosis, as recognized through univariate Cox regression analysis, NSCLC patients are stratified using the deep clustering algorithm to explore the optimal classification strategy, thereby establishing biologically meaningful metabolic subtypes of NSCLC patients. The precise NSCLC metabolic subtypes obtained from the network propagation algorithm and deep clustering algorithm are systematically evaluated and validated for survival benefits of immunotherapy. Our research marks progress towards developing a universal approach for classifying NSCLC patients based solely on somatic mutation profiles, employing deep clustering algorithm. The implementation of our research will help to deepen the analysis of NSCLC patients' metabolic subtypes from the perspective of tumor microenvironment, providing a strong basis for the formulation of more precise personalized treatment plans.
{"title":"Deep Clustering-Based Metabolic Stratification of Non-Small Cell Lung Cancer Patients Through Integration of Somatic Mutation Profile and Network Propagation Algorithm.","authors":"Xu Luo, Xinpeng Zhang, Dongqing Su, Honghao Li, Min Zou, Yuqiang Xiong, Lei Yang","doi":"10.1007/s12539-025-00699-2","DOIUrl":"https://doi.org/10.1007/s12539-025-00699-2","url":null,"abstract":"<p><p>As a common malignancy of the lower respiratory tract, non-small cell lung cancer (NSCLC) represents a major oncological challenge globally, characterized by high incidence and mortality rates. Recent research highlights the critical involvement of somatic mutations in the onset and development of NSCLC. Stratification of NSCLC patients based on somatic mutation data could facilitate the identification of patients likely to respond to personalized therapeutic strategies. However, stratification of NSCLC patients using somatic mutation data is challenging due to the sparseness of this data. In this study, based on sparse somatic mutation data from 4581 NSCLC patients from the Memorial Sloan Kettering Cancer Center (MSKCC) database, we systematically evaluate the metabolic pathway activity in NSCLC patients through the application of network propagation algorithm and computational biology algorithms. Based on these metabolic pathways associated with prognosis, as recognized through univariate Cox regression analysis, NSCLC patients are stratified using the deep clustering algorithm to explore the optimal classification strategy, thereby establishing biologically meaningful metabolic subtypes of NSCLC patients. The precise NSCLC metabolic subtypes obtained from the network propagation algorithm and deep clustering algorithm are systematically evaluated and validated for survival benefits of immunotherapy. Our research marks progress towards developing a universal approach for classifying NSCLC patients based solely on somatic mutation profiles, employing deep clustering algorithm. The implementation of our research will help to deepen the analysis of NSCLC patients' metabolic subtypes from the perspective of tumor microenvironment, providing a strong basis for the formulation of more precise personalized treatment plans.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143657203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-15DOI: 10.1007/s12539-025-00695-6
Shiyu Yan, Gang Yu, Jiaoxing Yang, Lingna Chen
Drug combination therapy has shown improved efficacy and decreased adverse effects, making it a practical approach for conditions like cancer. However, discovering all potential synergistic drug combinations requires extensive experimentation, which can be challenging. Recent research utilizing deep learning techniques has shown promise in reducing the number of experiments and overall workload by predicting synergistic drug combinations. Therefore, developing reliable and effective computational methods for predicting these combinations is essential. This paper proposed a novel method called Drug-molecule Connect Cell (DconnC) for predicting synergistic drug combinations. DconnC leverages cellular features as nodes to establish connections between drug molecular structures, allowing the extraction of pertinent features. These features are then optimized through self-augmented contrastive learning using bidirectional recurrent neural networks (Bi-RNN) and long short-term memory (LSTM) models, ultimately predicting the drug synergy. By integrating information about the molecular structure of drugs for the extraction of cell features, DconnC uncovers the inherent connection between drug molecular structures and cellular characteristics, thus improving the accuracy of predictions. The performance of our method is evaluated using a five-fold cross validation approach, demonstrating a 35 reduction in the mean square error (MSE) compared to the next-best method. Moreover, our method significantly outperformed alternative approaches in various evaluation criteria, particularly in predicting different cell lines and Loewe synergy score intervals.
{"title":"Predicting Synergistic Drug Combinations Based on Fusion of Cell and Drug Molecular Structures.","authors":"Shiyu Yan, Gang Yu, Jiaoxing Yang, Lingna Chen","doi":"10.1007/s12539-025-00695-6","DOIUrl":"https://doi.org/10.1007/s12539-025-00695-6","url":null,"abstract":"<p><p>Drug combination therapy has shown improved efficacy and decreased adverse effects, making it a practical approach for conditions like cancer. However, discovering all potential synergistic drug combinations requires extensive experimentation, which can be challenging. Recent research utilizing deep learning techniques has shown promise in reducing the number of experiments and overall workload by predicting synergistic drug combinations. Therefore, developing reliable and effective computational methods for predicting these combinations is essential. This paper proposed a novel method called Drug-molecule Connect Cell (DconnC) for predicting synergistic drug combinations. DconnC leverages cellular features as nodes to establish connections between drug molecular structures, allowing the extraction of pertinent features. These features are then optimized through self-augmented contrastive learning using bidirectional recurrent neural networks (Bi-RNN) and long short-term memory (LSTM) models, ultimately predicting the drug synergy. By integrating information about the molecular structure of drugs for the extraction of cell features, DconnC uncovers the inherent connection between drug molecular structures and cellular characteristics, thus improving the accuracy of predictions. The performance of our method is evaluated using a five-fold cross validation approach, demonstrating a 35 <math><mo>%</mo></math> reduction in the mean square error (MSE) compared to the next-best method. Moreover, our method significantly outperformed alternative approaches in various evaluation criteria, particularly in predicting different cell lines and Loewe synergy score intervals.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143634005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The emergence of methicillin-resistant Staphylococcus aureus (MRSA) as a recognized cause of community-acquired and hospital infections has brought about a need for the efficient and accurate identification of peptides with anti-MRSA properties in drug discovery and development pipelines. However, current experimental methods often tend to be labor- and resource-intensive. Thus, there is an immediate requirement to develop practical computational solutions for identifying sequence-based anti-MRSA peptides. Lately, pre-trained protein language models (pLMs) have emerged as a remarkable advancement for encoding peptide sequences as discriminative feature embeddings, uncovering plentiful protein-level information and successfully repurposing it for in silico peptide property prediction. In this study, we present pLM4MRSA, a framework based on pLMs designed to enhance the accuracy of predicting anti-MRSA peptides. In this framework, we combine feature embeddings from various pLMs, such as ProtTrans, and evolutionary-scale modeling (ESM-2) which provide complementary information for prediction. These individual pLM strengths are integrated to form hybrid feature embeddings. Next, we apply principal component analysis (PCA) to process these hybrid embeddings. The resulting PCA-transformed feature vectors are then used as inputs for constructing the predictive model. Experimental results on the independent test dataset showed that the proposed pLM4MRSA approach achieved a balanced accuracy and Matthew correlation coefficient of 0.983 and 0.980, respectively, representing remarkable improvements over the state-of-the-art methods by 2.53%-4.83% and 7.73%-13.23%, respectively. This indicates that pLM4MRSA is a high-performance prediction model with excellent scope of applicability. Additionally, comparison with well-known hand-crafted features demonstrated that the proposed hybrid feature embeddings complement each other effectively, capturing discriminative patterns for more accurate anti-MRSA peptide prediction. We anticipate that pLM4MRSA will serve as an effective solution for accurate and high-capacity prediction of anti-MRSA peptides from peptide sequences.
{"title":"Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.","authors":"Watshara Shoombuatong, Pakpoom Mookdarsanit, Lawankorn Mookdarsanit, Nalini Schaduangrat, Saeed Ahmed, Muhammad Kabir, Pramote Chumnanpuen","doi":"10.1007/s12539-025-00696-5","DOIUrl":"https://doi.org/10.1007/s12539-025-00696-5","url":null,"abstract":"<p><p>The emergence of methicillin-resistant Staphylococcus aureus (MRSA) as a recognized cause of community-acquired and hospital infections has brought about a need for the efficient and accurate identification of peptides with anti-MRSA properties in drug discovery and development pipelines. However, current experimental methods often tend to be labor- and resource-intensive. Thus, there is an immediate requirement to develop practical computational solutions for identifying sequence-based anti-MRSA peptides. Lately, pre-trained protein language models (pLMs) have emerged as a remarkable advancement for encoding peptide sequences as discriminative feature embeddings, uncovering plentiful protein-level information and successfully repurposing it for in silico peptide property prediction. In this study, we present pLM4MRSA, a framework based on pLMs designed to enhance the accuracy of predicting anti-MRSA peptides. In this framework, we combine feature embeddings from various pLMs, such as ProtTrans, and evolutionary-scale modeling (ESM-2) which provide complementary information for prediction. These individual pLM strengths are integrated to form hybrid feature embeddings. Next, we apply principal component analysis (PCA) to process these hybrid embeddings. The resulting PCA-transformed feature vectors are then used as inputs for constructing the predictive model. Experimental results on the independent test dataset showed that the proposed pLM4MRSA approach achieved a balanced accuracy and Matthew correlation coefficient of 0.983 and 0.980, respectively, representing remarkable improvements over the state-of-the-art methods by 2.53%-4.83% and 7.73%-13.23%, respectively. This indicates that pLM4MRSA is a high-performance prediction model with excellent scope of applicability. Additionally, comparison with well-known hand-crafted features demonstrated that the proposed hybrid feature embeddings complement each other effectively, capturing discriminative patterns for more accurate anti-MRSA peptide prediction. We anticipate that pLM4MRSA will serve as an effective solution for accurate and high-capacity prediction of anti-MRSA peptides from peptide sequences.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143604811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial intelligence technology has demonstrated remarkable diagnostic efficacy in modern biomedical image analysis. However, the practical application of artificial intelligence is significantly limited by the presence of similar pathologies among different diseases and the diversity of pathologies within the same disease. To address this issue, this paper proposes a reinforced collaborative-competitive representation classification (RCCRC) method. RCCRC enhances the contribution of different classes by introducing dual competitive constraints into the objective function. The first constraint integrates the collaborative space representation akin to holistic data, promoting the representation contribution of similar classes. The second constraint introduces specific class subspace representations to encourage competition among all classes, enhancing the discriminative nature of representation vectors. By unifying these two constraints, RCCRC effectively explores both global and specific data features in the reconstruction space. Extensive experiments on various biomedical image databases are conducted to exhibit the advantage of the proposed method in comparison with several state-of-the-art classification algorithms.
{"title":"Reinforced Collaborative-Competitive Representation for Biomedical Image Recognition.","authors":"Junwei Jin, Songbo Zhou, Yanting Li, Tanxin Zhu, Chao Fan, Hua Zhang, Peng Li","doi":"10.1007/s12539-024-00683-2","DOIUrl":"10.1007/s12539-024-00683-2","url":null,"abstract":"<p><p>Artificial intelligence technology has demonstrated remarkable diagnostic efficacy in modern biomedical image analysis. However, the practical application of artificial intelligence is significantly limited by the presence of similar pathologies among different diseases and the diversity of pathologies within the same disease. To address this issue, this paper proposes a reinforced collaborative-competitive representation classification (RCCRC) method. RCCRC enhances the contribution of different classes by introducing dual competitive constraints into the objective function. The first constraint integrates the collaborative space representation akin to holistic data, promoting the representation contribution of similar classes. The second constraint introduces specific class subspace representations to encourage competition among all classes, enhancing the discriminative nature of representation vectors. By unifying these two constraints, RCCRC effectively explores both global and specific data features in the reconstruction space. Extensive experiments on various biomedical image databases are conducted to exhibit the advantage of the proposed method in comparison with several state-of-the-art classification algorithms.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"215-230"},"PeriodicalIF":3.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143004934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-10-14DOI: 10.1007/s12539-024-00661-8
Haibin Li, Jun Meng, Zhaowei Wang, Yushi Luan
The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.
{"title":"misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy.","authors":"Haibin Li, Jun Meng, Zhaowei Wang, Yushi Luan","doi":"10.1007/s12539-024-00661-8","DOIUrl":"10.1007/s12539-024-00661-8","url":null,"abstract":"<p><p>The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"114-133"},"PeriodicalIF":3.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2024-10-17DOI: 10.1007/s12539-024-00657-4
Abicumaran Uthamacumaran
Pediatric glioblastoma is a complex dynamical disease that is difficult to treat due to its multiple adaptive behaviors driven largely by phenotypic plasticity. Integrated data science and network theory pipelines offer novel approaches to studying glioblastoma cell fate dynamics, particularly phenotypic transitions over time. Here we used various single-cell trajectory inference algorithms to infer signaling dynamics regulating pediatric glioblastoma-immune cell networks. We identified GATA2, PTPRZ1, TPT1, MTRNR2L1/2, OLIG1/2, SOX11, FXYD6, SEZ6L, PDGFRA, EGFR, S100B, WNT, TNF , and NF-kB as critical transition genes or signals regulating glioblastoma-immune network dynamics, revealing potential clinically relevant targets. Further, we reconstructed glioblastoma cell fate attractors and found complex bifurcation dynamics within glioblastoma phenotypic transitions, suggesting that a causal pattern may be driving glioblastoma evolution and cell fate decision-making. Together, our findings have implications for developing targeted therapies against glioblastoma, and the continued integration of quantitative approaches and artificial intelligence (AI) to understand pediatric glioblastoma tumor-immune interactions.
{"title":"Cell Fate Dynamics Reconstruction Identifies TPT1 and PTPRZ1 Feedback Loops as Master Regulators of Differentiation in Pediatric Glioblastoma-Immune Cell Networks.","authors":"Abicumaran Uthamacumaran","doi":"10.1007/s12539-024-00657-4","DOIUrl":"10.1007/s12539-024-00657-4","url":null,"abstract":"<p><p>Pediatric glioblastoma is a complex dynamical disease that is difficult to treat due to its multiple adaptive behaviors driven largely by phenotypic plasticity. Integrated data science and network theory pipelines offer novel approaches to studying glioblastoma cell fate dynamics, particularly phenotypic transitions over time. Here we used various single-cell trajectory inference algorithms to infer signaling dynamics regulating pediatric glioblastoma-immune cell networks. We identified GATA2, PTPRZ1, TPT1, MTRNR2L1/2, OLIG1/2, SOX11, FXYD6, SEZ6L, PDGFRA, EGFR, S100B, WNT, TNF <math><mi>α</mi></math> , and NF-kB as critical transition genes or signals regulating glioblastoma-immune network dynamics, revealing potential clinically relevant targets. Further, we reconstructed glioblastoma cell fate attractors and found complex bifurcation dynamics within glioblastoma phenotypic transitions, suggesting that a causal pattern may be driving glioblastoma evolution and cell fate decision-making. Together, our findings have implications for developing targeted therapies against glioblastoma, and the continued integration of quantitative approaches and artificial intelligence (AI) to understand pediatric glioblastoma tumor-immune interactions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"59-85"},"PeriodicalIF":3.9,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}