Artificial intelligence technology has demonstrated remarkable diagnostic efficacy in modern biomedical image analysis. However, the practical application of artificial intelligence is significantly limited by the presence of similar pathologies among different diseases and the diversity of pathologies within the same disease. To address this issue, this paper proposes a reinforced collaborative-competitive representation classification (RCCRC) method. RCCRC enhances the contribution of different classes by introducing dual competitive constraints into the objective function. The first constraint integrates the collaborative space representation akin to holistic data, promoting the representation contribution of similar classes. The second constraint introduces specific class subspace representations to encourage competition among all classes, enhancing the discriminative nature of representation vectors. By unifying these two constraints, RCCRC effectively explores both global and specific data features in the reconstruction space. Extensive experiments on various biomedical image databases are conducted to exhibit the advantage of the proposed method in comparison with several state-of-the-art classification algorithms.
{"title":"Reinforced Collaborative-Competitive Representation for Biomedical Image Recognition.","authors":"Junwei Jin, Songbo Zhou, Yanting Li, Tanxin Zhu, Chao Fan, Hua Zhang, Peng Li","doi":"10.1007/s12539-024-00683-2","DOIUrl":"https://doi.org/10.1007/s12539-024-00683-2","url":null,"abstract":"<p><p>Artificial intelligence technology has demonstrated remarkable diagnostic efficacy in modern biomedical image analysis. However, the practical application of artificial intelligence is significantly limited by the presence of similar pathologies among different diseases and the diversity of pathologies within the same disease. To address this issue, this paper proposes a reinforced collaborative-competitive representation classification (RCCRC) method. RCCRC enhances the contribution of different classes by introducing dual competitive constraints into the objective function. The first constraint integrates the collaborative space representation akin to holistic data, promoting the representation contribution of similar classes. The second constraint introduces specific class subspace representations to encourage competition among all classes, enhancing the discriminative nature of representation vectors. By unifying these two constraints, RCCRC effectively explores both global and specific data features in the reconstruction space. Extensive experiments on various biomedical image databases are conducted to exhibit the advantage of the proposed method in comparison with several state-of-the-art classification algorithms.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143004934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Waddington landscape was initially proposed to depict cell differentiation, and has been extended to explain phenomena such as reprogramming. The landscape serves as a concrete representation of cellular differentiation potential, yet the precise representation of this potential remains an unsolved problem, posing significant challenges to reconstructing the Waddington landscape. The characterization of cellular differentiation potential relies on transcriptomic signatures of known markers typically. Numerous computational models based on various energy indicators, such as Shannon entropy, have been proposed. While these models can effectively characterize cellular differentiation potential, most of them lack corresponding dynamical interpretations, which are crucial for enhancing our understanding of cell fate transitions. Therefore, from the perspective of cell migration and proliferation, a feasible framework was developed for calculating the dynamically interpretable energy indicator to reconstruct Waddington landscape based on sparse autoencoders and the reaction diffusion advection equation. Within this framework, typical cellular developmental processes, such as hematopoiesis and reprogramming processes, were dynamically simulated and their corresponding Waddington landscapes were reconstructed. Furthermore, dynamic simulation and reconstruction were also conducted for special developmental processes, such as embryogenesis and Epithelial-Mesenchymal Transition process. Ultimately, these diverse cell fate transitions were amalgamated into a unified Waddington landscape.
{"title":"Reconstructing Waddington Landscape from Cell Migration and Proliferation.","authors":"Yourui Han, Bolin Chen, Zhongwen Bi, Jianjun Zhang, Youpeng Hu, Jun Bian, Ruiming Kang, Xuequn Shang","doi":"10.1007/s12539-024-00686-z","DOIUrl":"https://doi.org/10.1007/s12539-024-00686-z","url":null,"abstract":"<p><p>The Waddington landscape was initially proposed to depict cell differentiation, and has been extended to explain phenomena such as reprogramming. The landscape serves as a concrete representation of cellular differentiation potential, yet the precise representation of this potential remains an unsolved problem, posing significant challenges to reconstructing the Waddington landscape. The characterization of cellular differentiation potential relies on transcriptomic signatures of known markers typically. Numerous computational models based on various energy indicators, such as Shannon entropy, have been proposed. While these models can effectively characterize cellular differentiation potential, most of them lack corresponding dynamical interpretations, which are crucial for enhancing our understanding of cell fate transitions. Therefore, from the perspective of cell migration and proliferation, a feasible framework was developed for calculating the dynamically interpretable energy indicator to reconstruct Waddington landscape based on sparse autoencoders and the reaction diffusion advection equation. Within this framework, typical cellular developmental processes, such as hematopoiesis and reprogramming processes, were dynamically simulated and their corresponding Waddington landscapes were reconstructed. Furthermore, dynamic simulation and reconstruction were also conducted for special developmental processes, such as embryogenesis and Epithelial-Mesenchymal Transition process. Ultimately, these diverse cell fate transitions were amalgamated into a unified Waddington landscape.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07DOI: 10.1007/s12539-024-00678-z
Xiaoxin Du, Jingwei Li, Bo Wang, Jianfei Zhang, Tongxuan Wang, Junqi Wang
The process of discovering new drugs related to microbes through traditional biological methods is lengthy and costly. In response to these issues, a new computational model (NRGCNMDA) is proposed to predict microbe-drug associations. First, Node2vec is used to extract potential associations between microorganisms and drugs, and a heterogeneous network of microbes and drugs is constructed. Then, a Graph Convolutional Network incorporating a fusion residual network mechanism (REGCN) is utilized to learn meaningful high-order similarity features. In addition, conditional random fields (CRF) are applied to ensure that microbes and drugs have similar feature embeddings. Finally, unobserved microbe-drug associations are scored based on combined embeddings. The experimental findings demonstrate that the NRGCNMDA approach outperforms several existing deep learning methods, and its AUC and AUPR values are 95.16% and 93.02%, respectively. The case study demonstrates that NRGCNMDA accurately predicts drugs associated with Enterococcus faecalis and Listeria monocytogenes, as well as microbes associated with ibuprofen and tetracycline.
{"title":"NRGCNMDA: Microbe-Drug Association Prediction Based on Residual Graph Convolutional Networks and Conditional Random Fields.","authors":"Xiaoxin Du, Jingwei Li, Bo Wang, Jianfei Zhang, Tongxuan Wang, Junqi Wang","doi":"10.1007/s12539-024-00678-z","DOIUrl":"https://doi.org/10.1007/s12539-024-00678-z","url":null,"abstract":"<p><p>The process of discovering new drugs related to microbes through traditional biological methods is lengthy and costly. In response to these issues, a new computational model (NRGCNMDA) is proposed to predict microbe-drug associations. First, Node2vec is used to extract potential associations between microorganisms and drugs, and a heterogeneous network of microbes and drugs is constructed. Then, a Graph Convolutional Network incorporating a fusion residual network mechanism (REGCN) is utilized to learn meaningful high-order similarity features. In addition, conditional random fields (CRF) are applied to ensure that microbes and drugs have similar feature embeddings. Finally, unobserved microbe-drug associations are scored based on combined embeddings. The experimental findings demonstrate that the NRGCNMDA approach outperforms several existing deep learning methods, and its AUC and AUPR values are 95.16% and 93.02%, respectively. The case study demonstrates that NRGCNMDA accurately predicts drugs associated with Enterococcus faecalis and Listeria monocytogenes, as well as microbes associated with ibuprofen and tetracycline.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07DOI: 10.1007/s12539-024-00680-5
Qi Zhang, Yuxiao Wei, Liwei Liu
Accurate prediction of drug-drug interaction (DDI) is essential to improve clinical efficacy, avoid adverse effects of drug combination therapy, and enhance drug safety. Recently researchers have developed several computer-aided methods for DDI prediction. However, these methods lack the substructural features that are critical to drug interactions and are not effective in generalizing across domains and different distribution data. In this work, we present SAGAN, a domain adaptive interpretable substructure-aware graph attention network for DDI prediction. Based on attention mechanism and unsupervised clustering algorithm, we propose a new substructure segmentation method, which segments the drug molecule into multiple substructures, learns the mechanism of drug interaction from the perspective of interaction, and identifies important interaction regions between drugs. To enhance the generalization ability of the model, we improve and apply a conditional domain adversarial network to achieve cross-domain generalization by alternately optimizing the cross-entropy loss on the source domain and the adversarial loss of the domain discriminator. We evaluate and compare SAGAN with the state-of-the-art DDI prediction model on four real-world datasets for both in-domain and cross-domain scenarios, and show that SAGAN achieves the best overall performance. Moreover, the visualization results of the model show that SAGAN has achieved pharmacologically significant substructure extraction, which can help drug developers screen for some undiscovered local interaction sites, and provide important information for further drug structure optimization. The codes and datasets are available online at https://github.com/wyx2012/SAGAN .
{"title":"A Domain Adaptive Interpretable Substructure-Aware Graph Attention Network for Drug-Drug Interaction Prediction.","authors":"Qi Zhang, Yuxiao Wei, Liwei Liu","doi":"10.1007/s12539-024-00680-5","DOIUrl":"https://doi.org/10.1007/s12539-024-00680-5","url":null,"abstract":"<p><p>Accurate prediction of drug-drug interaction (DDI) is essential to improve clinical efficacy, avoid adverse effects of drug combination therapy, and enhance drug safety. Recently researchers have developed several computer-aided methods for DDI prediction. However, these methods lack the substructural features that are critical to drug interactions and are not effective in generalizing across domains and different distribution data. In this work, we present SAGAN, a domain adaptive interpretable substructure-aware graph attention network for DDI prediction. Based on attention mechanism and unsupervised clustering algorithm, we propose a new substructure segmentation method, which segments the drug molecule into multiple substructures, learns the mechanism of drug interaction from the perspective of interaction, and identifies important interaction regions between drugs. To enhance the generalization ability of the model, we improve and apply a conditional domain adversarial network to achieve cross-domain generalization by alternately optimizing the cross-entropy loss on the source domain and the adversarial loss of the domain discriminator. We evaluate and compare SAGAN with the state-of-the-art DDI prediction model on four real-world datasets for both in-domain and cross-domain scenarios, and show that SAGAN achieves the best overall performance. Moreover, the visualization results of the model show that SAGAN has achieved pharmacologically significant substructure extraction, which can help drug developers screen for some undiscovered local interaction sites, and provide important information for further drug structure optimization. The codes and datasets are available online at https://github.com/wyx2012/SAGAN .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1007/s12539-024-00681-4
Peng-Cheng Zhao, Xue-Xin Wei, Qiong Wang, Hao-Yang Wang, Bing-Xue Du, Jia-Ning Li, Bei Zhu, Hui Yu, Jian-Yu Shi
Metabolism in vivo turns small molecules (e.g., drugs) into metabolites (new molecules), which brings unexpected safety issues in drug development. However, it is costly to determine metabolites by biological assays. Recent computational methods provide new promising approaches by predicting possible metabolites. Rule-based methods utilize predefined reaction-derived rules to infer metabolites. However, they are powerless to new metabolic reaction patterns. In contrast, rule-free methods leverage sequence-to-sequence machine translation to generate metabolites. Nevertheless, they are insufficient to characterize molecule structures, and bear weak interpretability. To address these issues in rule-free methods, this manuscript proposes a novel metabolism type-aware graph generative framework (MTGGF) for molecular metabolite prediction. It contains a two-stage learning process, including a pre-training on a large general chemical reaction dataset, and a fine-tuning on three smaller type-specific metabolic reaction datasets. Its core, an elaborate graph-to-graph generative model, treats both atoms and bonds as bipartite vertices, and molecules as bipartite graphs, such that it can embed rich information of molecule structures and ensure the integrity of generated metabolite structures. The comparison with state-of-the-art methods demonstrates its superiority. Furthermore, the ablation study validates the contributions of its two graph encoding components and its reaction-type-specific fine-tuning models. More importantly, based on interactive attention between a molecule and its metabolites, the case studies on five approved drugs reveal that there exist crucial substructures specific to metabolism types. It is anticipated that this framework can boost the risk evaluation of drug metabolites. The codes are available at https://github.com/zpczaizheli/Metabolite .
{"title":"MTGGF: A Metabolism Type-Aware Graph Generative Model for Molecular Metabolite Prediction.","authors":"Peng-Cheng Zhao, Xue-Xin Wei, Qiong Wang, Hao-Yang Wang, Bing-Xue Du, Jia-Ning Li, Bei Zhu, Hui Yu, Jian-Yu Shi","doi":"10.1007/s12539-024-00681-4","DOIUrl":"https://doi.org/10.1007/s12539-024-00681-4","url":null,"abstract":"<p><p>Metabolism in vivo turns small molecules (e.g., drugs) into metabolites (new molecules), which brings unexpected safety issues in drug development. However, it is costly to determine metabolites by biological assays. Recent computational methods provide new promising approaches by predicting possible metabolites. Rule-based methods utilize predefined reaction-derived rules to infer metabolites. However, they are powerless to new metabolic reaction patterns. In contrast, rule-free methods leverage sequence-to-sequence machine translation to generate metabolites. Nevertheless, they are insufficient to characterize molecule structures, and bear weak interpretability. To address these issues in rule-free methods, this manuscript proposes a novel metabolism type-aware graph generative framework (MTGGF) for molecular metabolite prediction. It contains a two-stage learning process, including a pre-training on a large general chemical reaction dataset, and a fine-tuning on three smaller type-specific metabolic reaction datasets. Its core, an elaborate graph-to-graph generative model, treats both atoms and bonds as bipartite vertices, and molecules as bipartite graphs, such that it can embed rich information of molecule structures and ensure the integrity of generated metabolite structures. The comparison with state-of-the-art methods demonstrates its superiority. Furthermore, the ablation study validates the contributions of its two graph encoding components and its reaction-type-specific fine-tuning models. More importantly, based on interactive attention between a molecule and its metabolites, the case studies on five approved drugs reveal that there exist crucial substructures specific to metabolism types. It is anticipated that this framework can boost the risk evaluation of drug metabolites. The codes are available at https://github.com/zpczaizheli/Metabolite .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142931780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-27DOI: 10.1007/s12539-024-00682-3
Ziyi Han, Yuanyuan Zhang, Lin Liu, Yulin Zhang
The imperative development of point-of-care diagnosis for accurate and rapid medical image segmentation, has become increasingly urgent in recent years. Although some pioneering work has applied complex modules to improve segmentation performance, resulting models are often heavy, which is not practical for the modern clinical setting of point-of-care diagnosis. To address these challenges, we propose UltraNet, a state-of-the-art lightweight model that achieves competitive performance in segmenting multiple parts of medical images with the lowest parameters and computational complexity. To extract a sufficient amount of feature information and replace cumbersome modules, the Shallow Focus Float Block (ShalFoFo) and the Dual-stream Synergy Feature Extraction (DuSem) are respectively proposed at both shallow and deep levels. ShalFoFo is designed to capture finer-grained features containing more pixels, while DuSem is capable of extracting distinct deep semantic features from two different perspectives. By jointly utilizing them, the accuracy and stability of UltraNet segmentation results are enhanced. To evaluate performance, UltraNet's generalization ability was assessed on five datasets with different tasks. Compared to UNet, UltraNet reduces the parameters and computational complexity by 46 times and 26 times, respectively. Experimental results demonstrate that UltraNet achieves a state-of-the-art balance among parameters, computational complexity, and segmentation performance. Codes are available at https://github.com/Ziii1/UltraNet .
{"title":"UltraNet: Unleashing the Power of Simplicity for Accurate Medical Image Segmentation.","authors":"Ziyi Han, Yuanyuan Zhang, Lin Liu, Yulin Zhang","doi":"10.1007/s12539-024-00682-3","DOIUrl":"https://doi.org/10.1007/s12539-024-00682-3","url":null,"abstract":"<p><p>The imperative development of point-of-care diagnosis for accurate and rapid medical image segmentation, has become increasingly urgent in recent years. Although some pioneering work has applied complex modules to improve segmentation performance, resulting models are often heavy, which is not practical for the modern clinical setting of point-of-care diagnosis. To address these challenges, we propose UltraNet, a state-of-the-art lightweight model that achieves competitive performance in segmenting multiple parts of medical images with the lowest parameters and computational complexity. To extract a sufficient amount of feature information and replace cumbersome modules, the Shallow Focus Float Block (ShalFoFo) and the Dual-stream Synergy Feature Extraction (DuSem) are respectively proposed at both shallow and deep levels. ShalFoFo is designed to capture finer-grained features containing more pixels, while DuSem is capable of extracting distinct deep semantic features from two different perspectives. By jointly utilizing them, the accuracy and stability of UltraNet segmentation results are enhanced. To evaluate performance, UltraNet's generalization ability was assessed on five datasets with different tasks. Compared to UNet, UltraNet reduces the parameters and computational complexity by 46 times and 26 times, respectively. Experimental results demonstrate that UltraNet achieves a state-of-the-art balance among parameters, computational complexity, and segmentation performance. Codes are available at https://github.com/Ziii1/UltraNet .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142894251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-26DOI: 10.1007/s12539-024-00679-y
Xi Deng, Lin Liu
The precise spatiotemporal expression of long noncoding RNAs (lncRNAs) plays a pivotal role in biological regulation, and aberrant expression of lncRNAs in different subcellular localizations has been intricately linked to the onset and progression of a variety of cancers. Computational methods provide effective means for predicting lncRNA subcellular localization, but current studies either ignore cell line and tissue specificity or the correlation and shared information among cell lines. In this study, we propose a novel approach, BiGM-lncLoc, treating the prediction of lncRNA subcellular localization across cell lines as a multi-graph meta-learning task. Our investigation involves two categories of data: the localization data of nucleotide sequences in different cell lines and cell line expression data. BiGM-lncLoc comprises a cell line-specific optimization network learning specific knowledge from cell line expression data and a graph neural network optimized across cell lines. Subsequently, the specific and shared knowledge acquired through bi-level optimization is applied to a new cell-line prediction task without the need for re-training or fine-tuning. Additionally, through key feature analysis of the impact of different nucleotide combinations on the model, we confirm the necessity of cell line-specific studies based on correlation analysis. Finally, experiments conducted on various cell lines with different data sizes indicate that BiGM-lncLoc outperforms other methods in terms of prediction accuracy, with an average accuracy of 97.7%. After removing overlapping samples to ensure data independence for each cell line, the accuracy ranged from 82.4% to 94.7%, still surpassing existing models. Our code can be found at https://github.com/BioCL1/BiGM-lncLoc .
{"title":"BiGM-lncLoc: Bi-level Multi-Graph Meta-Learning for Predicting Cell-Specific Long Noncoding RNAs Subcellular Localization.","authors":"Xi Deng, Lin Liu","doi":"10.1007/s12539-024-00679-y","DOIUrl":"https://doi.org/10.1007/s12539-024-00679-y","url":null,"abstract":"<p><p>The precise spatiotemporal expression of long noncoding RNAs (lncRNAs) plays a pivotal role in biological regulation, and aberrant expression of lncRNAs in different subcellular localizations has been intricately linked to the onset and progression of a variety of cancers. Computational methods provide effective means for predicting lncRNA subcellular localization, but current studies either ignore cell line and tissue specificity or the correlation and shared information among cell lines. In this study, we propose a novel approach, BiGM-lncLoc, treating the prediction of lncRNA subcellular localization across cell lines as a multi-graph meta-learning task. Our investigation involves two categories of data: the localization data of nucleotide sequences in different cell lines and cell line expression data. BiGM-lncLoc comprises a cell line-specific optimization network learning specific knowledge from cell line expression data and a graph neural network optimized across cell lines. Subsequently, the specific and shared knowledge acquired through bi-level optimization is applied to a new cell-line prediction task without the need for re-training or fine-tuning. Additionally, through key feature analysis of the impact of different nucleotide combinations on the model, we confirm the necessity of cell line-specific studies based on correlation analysis. Finally, experiments conducted on various cell lines with different data sizes indicate that BiGM-lncLoc outperforms other methods in terms of prediction accuracy, with an average accuracy of 97.7%. After removing overlapping samples to ensure data independence for each cell line, the accuracy ranged from 82.4% to 94.7%, still surpassing existing models. Our code can be found at https://github.com/BioCL1/BiGM-lncLoc .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142894249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1007/s12539-024-00673-4
Lun Zhu, Zehua Chen, Sen Yang
Cell-Penetrating Peptides (CPPs) are a crucial carrier for drug delivery. Since the process of synthesizing new CPPs in the laboratory is both time- and resource-consuming, computational methods to predict potential CPPs can be used to find CPPs to enhance the development of CPPs in therapy. In this study, EnDM-CPP is proposed, which combines machine learning algorithms (SVM and CatBoost) with convolutional neural networks (CNN and TextCNN). For dataset construction, three previous CPP benchmark datasets, including CPPsite 2.0, MLCPP 2.0, and CPP924, are merged to improve the diversity and reduce homology. For feature generation, two language model-based features obtained from the Transformer architecture, including ProtT5 and ESM-2, are employed in CNN and TextCNN. Additionally, sequence features, such as CPRS, Hybrid PseAAC, KSC, etc., are input to SVM and CatBoost. Based on the result of each predictor, Logistic Regression (LR) is built to predict the final decision. The experiment results indicate that ProtT5 and ESM-2 fusion features significantly contribute to predicting CPP and that combining employed features and models demonstrates better association. On an independent test dataset comparison, EnDM-CPP achieved an accuracy of 0.9495 and a Matthews correlation coefficient of 0.9008 with an improvement of 2.23%-9.48% and 4.32%-19.02%, respectively, compared with other state-of-the-art methods. Code and data are available at https://github.com/tudou1231/EnDM-CPP.git .
{"title":"EnDM-CPP: A Multi-view Explainable Framework Based on Deep Learning and Machine Learning for Identifying Cell-Penetrating Peptides with Transformers and Analyzing Sequence Information.","authors":"Lun Zhu, Zehua Chen, Sen Yang","doi":"10.1007/s12539-024-00673-4","DOIUrl":"https://doi.org/10.1007/s12539-024-00673-4","url":null,"abstract":"<p><p>Cell-Penetrating Peptides (CPPs) are a crucial carrier for drug delivery. Since the process of synthesizing new CPPs in the laboratory is both time- and resource-consuming, computational methods to predict potential CPPs can be used to find CPPs to enhance the development of CPPs in therapy. In this study, EnDM-CPP is proposed, which combines machine learning algorithms (SVM and CatBoost) with convolutional neural networks (CNN and TextCNN). For dataset construction, three previous CPP benchmark datasets, including CPPsite 2.0, MLCPP 2.0, and CPP924, are merged to improve the diversity and reduce homology. For feature generation, two language model-based features obtained from the Transformer architecture, including ProtT5 and ESM-2, are employed in CNN and TextCNN. Additionally, sequence features, such as CPRS, Hybrid PseAAC, KSC, etc., are input to SVM and CatBoost. Based on the result of each predictor, Logistic Regression (LR) is built to predict the final decision. The experiment results indicate that ProtT5 and ESM-2 fusion features significantly contribute to predicting CPP and that combining employed features and models demonstrates better association. On an independent test dataset comparison, EnDM-CPP achieved an accuracy of 0.9495 and a Matthews correlation coefficient of 0.9008 with an improvement of 2.23%-9.48% and 4.32%-19.02%, respectively, compared with other state-of-the-art methods. Code and data are available at https://github.com/tudou1231/EnDM-CPP.git .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-23DOI: 10.1007/s12539-024-00677-0
Haixia Zhai, Chengyao Dong, Tao Wang, Junwei Luo
Structural variation (SV) is an important component of the diversity of the human genome. Many studies have shown that SV has a significant impact on human disease and is strongly associated with the development of cancer. In recent years, the Hi-C sequencing technique has been shown to be useful for detecting large-scale SVs, and several methods have been proposed for identifying SVs from Hi-C data. However, due to the complexity of the 3D genome structure, accurate identifying SVs from the Hi-C contact matrix remains a challenging task. Here, we present HiSVision, a method for identifying large-scale SVs from Hi-C data using a detection transformer framework. Inspired by object detection network, we transform the Hi-C contact matrix into images, then identify candidate SV regions on the image by detection transformer, and finally filter SVs based on features around the breakpoints. Experimental results show that HiSVision outperforms existing methods in terms of precision and F1 score on cancer cell lines and simulated datasets. The source code and data are available from https://github.com/dcy99/HiSVision .
{"title":"HiSVision: A Method for Detecting Large-Scale Structural Variations Based on Hi-C Data and Detection Transformer.","authors":"Haixia Zhai, Chengyao Dong, Tao Wang, Junwei Luo","doi":"10.1007/s12539-024-00677-0","DOIUrl":"https://doi.org/10.1007/s12539-024-00677-0","url":null,"abstract":"<p><p>Structural variation (SV) is an important component of the diversity of the human genome. Many studies have shown that SV has a significant impact on human disease and is strongly associated with the development of cancer. In recent years, the Hi-C sequencing technique has been shown to be useful for detecting large-scale SVs, and several methods have been proposed for identifying SVs from Hi-C data. However, due to the complexity of the 3D genome structure, accurate identifying SVs from the Hi-C contact matrix remains a challenging task. Here, we present HiSVision, a method for identifying large-scale SVs from Hi-C data using a detection transformer framework. Inspired by object detection network, we transform the Hi-C contact matrix into images, then identify candidate SV regions on the image by detection transformer, and finally filter SVs based on features around the breakpoints. Experimental results show that HiSVision outperforms existing methods in terms of precision and F1 score on cancer cell lines and simulated datasets. The source code and data are available from https://github.com/dcy99/HiSVision .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.
{"title":"Identification of Multi-functional Therapeutic Peptides Based on Prototypical Supervised Contrastive Learning.","authors":"Sitong Niu, Henghui Fan, Fei Wang, Xiaomei Yang, Junfeng Xia","doi":"10.1007/s12539-024-00674-3","DOIUrl":"https://doi.org/10.1007/s12539-024-00674-3","url":null,"abstract":"<p><p>High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}