{"title":"Protein sub-cellular localization based on noise-intensity-weighted linear discriminant analysis and an improved k-nearest-neighbor classifier","authors":"Zhenfeng Lei, Shunfang Wang, Dongshu Xu","doi":"10.1109/CISP-BMEI.2016.7853022","DOIUrl":null,"url":null,"abstract":"Data dimension reduction and classification are the key steps in protein sub-cellular localization. With the rapid development of biological science and technology, a plenty of high dimensional biological data have generated, accompanied by certain noise. How to express high dimensional data in low dimension space and achieve better classification effect have become one of the significant tasks for researchers in the application of protein sub-cellular localization. Both the traditional dimension reduction algorithm of linear discriminant analysis (LDA) and the popular classifier of k-nearest neighbor (KNN) cannot meet the needs of the current application well if they are simply used without improvements. The aim of LDA is to seek out a projecting line at certain direction letting the projection of samples as far away as possible. However, noise jamming expands the within-class distance and makes the classes uneasily separated even by LDA. Besides, KNN has not taken samples' inequality into consideration primely. Therefore, this paper first uses the noise intensity as a kind of weight in LDA, then improves KNN algorithm by considering the inequality of samples from different classes with a within-class KNN method. Experimental results show that the proposed method by combining the above two improvements gets ideal feasibility and effectiveness in classification through the verification of Jackknife.","PeriodicalId":275095,"journal":{"name":"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISP-BMEI.2016.7853022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Data dimension reduction and classification are the key steps in protein sub-cellular localization. With the rapid development of biological science and technology, a plenty of high dimensional biological data have generated, accompanied by certain noise. How to express high dimensional data in low dimension space and achieve better classification effect have become one of the significant tasks for researchers in the application of protein sub-cellular localization. Both the traditional dimension reduction algorithm of linear discriminant analysis (LDA) and the popular classifier of k-nearest neighbor (KNN) cannot meet the needs of the current application well if they are simply used without improvements. The aim of LDA is to seek out a projecting line at certain direction letting the projection of samples as far away as possible. However, noise jamming expands the within-class distance and makes the classes uneasily separated even by LDA. Besides, KNN has not taken samples' inequality into consideration primely. Therefore, this paper first uses the noise intensity as a kind of weight in LDA, then improves KNN algorithm by considering the inequality of samples from different classes with a within-class KNN method. Experimental results show that the proposed method by combining the above two improvements gets ideal feasibility and effectiveness in classification through the verification of Jackknife.