{"title":"A Modified Binary Flower Pollination Algorithm: A Fast and Effective Combination of Feature Selection Techniques for SNP Classification","authors":"Wanthanee Rathasamuth, Kitsuchart Pasupa","doi":"10.1109/ICITEED.2019.8929963","DOIUrl":null,"url":null,"abstract":"Single nucleotide polymorphism (SNP) is a genetic trait responsible for the differences in the characteristics of individuals of a living species. Machine learning has been brought in to classify swine breed according to their SNPs. However, since the number of samples (number of pigs sampled) is usually much smaller than the number of features (SNPs) to classify, there may occur an overfitting problem. Therefore, some feature selection techniques were applied to the entire SNPs to reduce them to a much smaller number of most significant SNPs to be used in the classification. In this study, we used information gain in combination with binary flower pollination algorithm for feature selection as well as a cut-off-point-finding threshold for specifying a 0 or 1 value for a position in the solution vector and a GA bit-flip mutation operator. We called it Modified-BFPA. The classifier was SVM. Evaluated against a few other feature selection techniques, our combination of techniques was, at the very least, competitive to those. It selected only 1.76 % of most significant SNPs from the entire set of 10,210 SNPs. The SNPs that it selected provided 95.12 % classification accuracy. Moreover, it was fast: an average of 1.60 iterations in combination with SVM to find a set of best SNPs that provided the highest classification accuracy.","PeriodicalId":6598,"journal":{"name":"2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"71 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2019.8929963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Single nucleotide polymorphism (SNP) is a genetic trait responsible for the differences in the characteristics of individuals of a living species. Machine learning has been brought in to classify swine breed according to their SNPs. However, since the number of samples (number of pigs sampled) is usually much smaller than the number of features (SNPs) to classify, there may occur an overfitting problem. Therefore, some feature selection techniques were applied to the entire SNPs to reduce them to a much smaller number of most significant SNPs to be used in the classification. In this study, we used information gain in combination with binary flower pollination algorithm for feature selection as well as a cut-off-point-finding threshold for specifying a 0 or 1 value for a position in the solution vector and a GA bit-flip mutation operator. We called it Modified-BFPA. The classifier was SVM. Evaluated against a few other feature selection techniques, our combination of techniques was, at the very least, competitive to those. It selected only 1.76 % of most significant SNPs from the entire set of 10,210 SNPs. The SNPs that it selected provided 95.12 % classification accuracy. Moreover, it was fast: an average of 1.60 iterations in combination with SVM to find a set of best SNPs that provided the highest classification accuracy.