MBSCLoc:基于聚类平衡子空间划分方法和多类对比表征学习的多标签亚细胞定位预测。

IF 7.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Journal of Biomedical and Health Informatics Pub Date : 2025-01-31 DOI:10.1109/JBHI.2025.3537284
Bangyi Zhang;Yun Zuo;Zhiqiang Dai;Sifan Zhu;Xuan Liu;Zhaohong Deng
{"title":"MBSCLoc:基于聚类平衡子空间划分方法和多类对比表征学习的多标签亚细胞定位预测。","authors":"Bangyi Zhang;Yun Zuo;Zhiqiang Dai;Sifan Zhu;Xuan Liu;Zhaohong Deng","doi":"10.1109/JBHI.2025.3537284","DOIUrl":null,"url":null,"abstract":"mRNA subcellular localization is a prevalent and essential mechanism that precisely regulates protein translation and significantly impacts various cellular processes. mRNA subcellular localization has advanced the understanding of mRNA function, yet existing methods face limitations, including imbalanced data, suboptimal model performance, and inadequate generalization, particularly in multi-label localization scenarios where solutions are scarce. This study introduces MBSCLoc, a predictor for mRNA multi-label subcellular localization. MBSCLoc predicts mRNA locations across multiple cellular compartments simultaneously, overcoming challenges like single-location prediction, incomplete feature extraction, and imbalanced data. MBSCLoc leverages UTR-LM model for feature extraction, followed by multi-class contrastive representation learning and Clustering Balanced Subspace Partitioning to construct balanced subspaces. It then optimizes sample distribution to tackle severe data imbalance and uses multiple XGBoost classifiers, integrated through voting, to enhance accuracy and generalization. Five-fold cross-validation and independent testing results show that MBSCLoc significantly outperforms other methods. Additionally, MBSCLoc offers superior pixel-level interpretability, strongly supporting mRNA multi-label subcellular localization research. Crucially, the importance of the 5' UTR and 3' UTR regions has been preliminarily confirmed using traditional biological analysis and Tree-SHAP, with most mRNA sequences showing significant relevance in these regions, especially the 3' UTR where about 80% of specific sites reach peak significance.","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"29 10","pages":"7020-7033"},"PeriodicalIF":7.7000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MBSCLoc: Multi-Label Subcellular Localization Predict Based on Cluster Balanced Subspace Partitioning Method and Multi-Class Contrastive Representation Learning\",\"authors\":\"Bangyi Zhang;Yun Zuo;Zhiqiang Dai;Sifan Zhu;Xuan Liu;Zhaohong Deng\",\"doi\":\"10.1109/JBHI.2025.3537284\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"mRNA subcellular localization is a prevalent and essential mechanism that precisely regulates protein translation and significantly impacts various cellular processes. mRNA subcellular localization has advanced the understanding of mRNA function, yet existing methods face limitations, including imbalanced data, suboptimal model performance, and inadequate generalization, particularly in multi-label localization scenarios where solutions are scarce. This study introduces MBSCLoc, a predictor for mRNA multi-label subcellular localization. MBSCLoc predicts mRNA locations across multiple cellular compartments simultaneously, overcoming challenges like single-location prediction, incomplete feature extraction, and imbalanced data. MBSCLoc leverages UTR-LM model for feature extraction, followed by multi-class contrastive representation learning and Clustering Balanced Subspace Partitioning to construct balanced subspaces. It then optimizes sample distribution to tackle severe data imbalance and uses multiple XGBoost classifiers, integrated through voting, to enhance accuracy and generalization. Five-fold cross-validation and independent testing results show that MBSCLoc significantly outperforms other methods. Additionally, MBSCLoc offers superior pixel-level interpretability, strongly supporting mRNA multi-label subcellular localization research. Crucially, the importance of the 5' UTR and 3' UTR regions has been preliminarily confirmed using traditional biological analysis and Tree-SHAP, with most mRNA sequences showing significant relevance in these regions, especially the 3' UTR where about 80% of specific sites reach peak significance.\",\"PeriodicalId\":13073,\"journal\":{\"name\":\"IEEE Journal of Biomedical and Health Informatics\",\"volume\":\"29 10\",\"pages\":\"7020-7033\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Biomedical and Health Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10858869/\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10858869/","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

mRNA亚细胞定位是一种普遍而重要的机制,它精确调节蛋白质翻译并显著影响各种细胞过程。mRNA亚细胞定位促进了对mRNA功能的理解,但现有的方法面临着局限性,包括数据不平衡、模型性能不理想和泛化不足,特别是在解决方案稀缺的多标签定位场景中。本研究引入了mRNA多标签亚细胞定位的预测因子MBSCLoc。MBSCLoc同时预测多个细胞区室的mRNA位置,克服了单位置预测、不完整特征提取和数据不平衡等挑战。MBSCLoc利用UTR-LM模型进行特征提取,然后利用多类对比表示学习和聚类平衡子空间划分来构建平衡子空间。然后,它优化样本分布以解决严重的数据不平衡问题,并使用多个XGBoost分类器,通过投票集成,以提高准确性和泛化。五重交叉验证和独立检验结果表明,MBSCLoc显著优于其他方法。此外,MBSCLoc提供了优越的像素级可解释性,有力地支持mRNA多标签亚细胞定位研究。至关重要的是,5' UTR和3' UTR区域的重要性已经通过传统的生物学分析和Tree-SHAP初步证实,大多数mRNA序列在这些区域显示出显著的相关性,特别是3' UTR,其中约80%的特定位点达到峰值显著性。同时,为了方便研究人员使用MBSCLoc,还开发了一个免费访问的网站:http://www.mbscloc.com/。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MBSCLoc: Multi-Label Subcellular Localization Predict Based on Cluster Balanced Subspace Partitioning Method and Multi-Class Contrastive Representation Learning
mRNA subcellular localization is a prevalent and essential mechanism that precisely regulates protein translation and significantly impacts various cellular processes. mRNA subcellular localization has advanced the understanding of mRNA function, yet existing methods face limitations, including imbalanced data, suboptimal model performance, and inadequate generalization, particularly in multi-label localization scenarios where solutions are scarce. This study introduces MBSCLoc, a predictor for mRNA multi-label subcellular localization. MBSCLoc predicts mRNA locations across multiple cellular compartments simultaneously, overcoming challenges like single-location prediction, incomplete feature extraction, and imbalanced data. MBSCLoc leverages UTR-LM model for feature extraction, followed by multi-class contrastive representation learning and Clustering Balanced Subspace Partitioning to construct balanced subspaces. It then optimizes sample distribution to tackle severe data imbalance and uses multiple XGBoost classifiers, integrated through voting, to enhance accuracy and generalization. Five-fold cross-validation and independent testing results show that MBSCLoc significantly outperforms other methods. Additionally, MBSCLoc offers superior pixel-level interpretability, strongly supporting mRNA multi-label subcellular localization research. Crucially, the importance of the 5' UTR and 3' UTR regions has been preliminarily confirmed using traditional biological analysis and Tree-SHAP, with most mRNA sequences showing significant relevance in these regions, especially the 3' UTR where about 80% of specific sites reach peak significance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Journal of Biomedical and Health Informatics
IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
13.60
自引率
6.50%
发文量
1151
期刊介绍: IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.
期刊最新文献
NIARE: Noise-Induced Analysis for Model Robustness Evaluation in Asthma Phenotype Classification. DuA: Dual Attentive Transformer in Long-Term Continuous EEG Emotion Analysis. Cross-Level Topological Framework: Learning Explainable Region-Channel Representations from EEG Signals for Emotional Decoding. Antithetic Sampling Enhanced Probabilistic Diffusion for Denoising Cardiac Time Series. DrivenMorph: Bridging Attention Mechanism and Variational Image Registration via Difference Modeling.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1