Bangyi Zhang;Yun Zuo;Zhiqiang Dai;Sifan Zhu;Xuan Liu;Zhaohong Deng
{"title":"MBSCLoc:基于聚类平衡子空间划分方法和多类对比表征学习的多标签亚细胞定位预测。","authors":"Bangyi Zhang;Yun Zuo;Zhiqiang Dai;Sifan Zhu;Xuan Liu;Zhaohong Deng","doi":"10.1109/JBHI.2025.3537284","DOIUrl":null,"url":null,"abstract":"mRNA subcellular localization is a prevalent and essential mechanism that precisely regulates protein translation and significantly impacts various cellular processes. mRNA subcellular localization has advanced the understanding of mRNA function, yet existing methods face limitations, including imbalanced data, suboptimal model performance, and inadequate generalization, particularly in multi-label localization scenarios where solutions are scarce. This study introduces MBSCLoc, a predictor for mRNA multi-label subcellular localization. MBSCLoc predicts mRNA locations across multiple cellular compartments simultaneously, overcoming challenges like single-location prediction, incomplete feature extraction, and imbalanced data. MBSCLoc leverages UTR-LM model for feature extraction, followed by multi-class contrastive representation learning and Clustering Balanced Subspace Partitioning to construct balanced subspaces. It then optimizes sample distribution to tackle severe data imbalance and uses multiple XGBoost classifiers, integrated through voting, to enhance accuracy and generalization. Five-fold cross-validation and independent testing results show that MBSCLoc significantly outperforms other methods. Additionally, MBSCLoc offers superior pixel-level interpretability, strongly supporting mRNA multi-label subcellular localization research. Crucially, the importance of the 5' UTR and 3' UTR regions has been preliminarily confirmed using traditional biological analysis and Tree-SHAP, with most mRNA sequences showing significant relevance in these regions, especially the 3' UTR where about 80% of specific sites reach peak significance.","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"29 10","pages":"7020-7033"},"PeriodicalIF":7.7000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MBSCLoc: Multi-Label Subcellular Localization Predict Based on Cluster Balanced Subspace Partitioning Method and Multi-Class Contrastive Representation Learning\",\"authors\":\"Bangyi Zhang;Yun Zuo;Zhiqiang Dai;Sifan Zhu;Xuan Liu;Zhaohong Deng\",\"doi\":\"10.1109/JBHI.2025.3537284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"mRNA subcellular localization is a prevalent and essential mechanism that precisely regulates protein translation and significantly impacts various cellular processes. mRNA subcellular localization has advanced the understanding of mRNA function, yet existing methods face limitations, including imbalanced data, suboptimal model performance, and inadequate generalization, particularly in multi-label localization scenarios where solutions are scarce. This study introduces MBSCLoc, a predictor for mRNA multi-label subcellular localization. MBSCLoc predicts mRNA locations across multiple cellular compartments simultaneously, overcoming challenges like single-location prediction, incomplete feature extraction, and imbalanced data. MBSCLoc leverages UTR-LM model for feature extraction, followed by multi-class contrastive representation learning and Clustering Balanced Subspace Partitioning to construct balanced subspaces. It then optimizes sample distribution to tackle severe data imbalance and uses multiple XGBoost classifiers, integrated through voting, to enhance accuracy and generalization. Five-fold cross-validation and independent testing results show that MBSCLoc significantly outperforms other methods. Additionally, MBSCLoc offers superior pixel-level interpretability, strongly supporting mRNA multi-label subcellular localization research. Crucially, the importance of the 5' UTR and 3' UTR regions has been preliminarily confirmed using traditional biological analysis and Tree-SHAP, with most mRNA sequences showing significant relevance in these regions, especially the 3' UTR where about 80% of specific sites reach peak significance.\",\"PeriodicalId\":13073,\"journal\":{\"name\":\"IEEE Journal of Biomedical and Health Informatics\",\"volume\":\"29 10\",\"pages\":\"7020-7033\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Biomedical and Health Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10858869/\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10858869/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
MBSCLoc: Multi-Label Subcellular Localization Predict Based on Cluster Balanced Subspace Partitioning Method and Multi-Class Contrastive Representation Learning
mRNA subcellular localization is a prevalent and essential mechanism that precisely regulates protein translation and significantly impacts various cellular processes. mRNA subcellular localization has advanced the understanding of mRNA function, yet existing methods face limitations, including imbalanced data, suboptimal model performance, and inadequate generalization, particularly in multi-label localization scenarios where solutions are scarce. This study introduces MBSCLoc, a predictor for mRNA multi-label subcellular localization. MBSCLoc predicts mRNA locations across multiple cellular compartments simultaneously, overcoming challenges like single-location prediction, incomplete feature extraction, and imbalanced data. MBSCLoc leverages UTR-LM model for feature extraction, followed by multi-class contrastive representation learning and Clustering Balanced Subspace Partitioning to construct balanced subspaces. It then optimizes sample distribution to tackle severe data imbalance and uses multiple XGBoost classifiers, integrated through voting, to enhance accuracy and generalization. Five-fold cross-validation and independent testing results show that MBSCLoc significantly outperforms other methods. Additionally, MBSCLoc offers superior pixel-level interpretability, strongly supporting mRNA multi-label subcellular localization research. Crucially, the importance of the 5' UTR and 3' UTR regions has been preliminarily confirmed using traditional biological analysis and Tree-SHAP, with most mRNA sequences showing significant relevance in these regions, especially the 3' UTR where about 80% of specific sites reach peak significance.
期刊介绍:
IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.