Priyanka S. Rana, E. Meijering, A. Sowmya, Yang Song
{"title":"Multi-Label Classification Based On Subcellular Region-Guided Feature Description For Protein Localisation","authors":"Priyanka S. Rana, E. Meijering, A. Sowmya, Yang Song","doi":"10.1109/ISBI48211.2021.9434145","DOIUrl":null,"url":null,"abstract":"In this paper, we present a multi-label classification pipeline and a novel feature descriptor for the protein subcellular localisation. The challenge here is the development of a computational model that can classify multi-site proteins on a highly imbalanced dataset with a long-tail distribution and multi-label images. To address this challenge, we design a Location-Sorted Random Projections feature descriptor to represent image intensity and gradient of the protein of interest in reference to the correlated cellular region. Multilabel Synthetic Minority Over-sampling Technique is optimised to generate synthetic features with labels to handle class imbalance. Our method achieves the state-of-the-art performance on a large-scale public dataset and demonstrates excellent performance for the minority classes.","PeriodicalId":372939,"journal":{"name":"2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISBI48211.2021.9434145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In this paper, we present a multi-label classification pipeline and a novel feature descriptor for the protein subcellular localisation. The challenge here is the development of a computational model that can classify multi-site proteins on a highly imbalanced dataset with a long-tail distribution and multi-label images. To address this challenge, we design a Location-Sorted Random Projections feature descriptor to represent image intensity and gradient of the protein of interest in reference to the correlated cellular region. Multilabel Synthetic Minority Over-sampling Technique is optimised to generate synthetic features with labels to handle class imbalance. Our method achieves the state-of-the-art performance on a large-scale public dataset and demonstrates excellent performance for the minority classes.