A Modified Binary Flower Pollination Algorithm: A Fast and Effective Combination of Feature Selection Techniques for SNP Classification

Wanthanee Rathasamuth, Kitsuchart Pasupa
{"title":"A Modified Binary Flower Pollination Algorithm: A Fast and Effective Combination of Feature Selection Techniques for SNP Classification","authors":"Wanthanee Rathasamuth, Kitsuchart Pasupa","doi":"10.1109/ICITEED.2019.8929963","DOIUrl":null,"url":null,"abstract":"Single nucleotide polymorphism (SNP) is a genetic trait responsible for the differences in the characteristics of individuals of a living species. Machine learning has been brought in to classify swine breed according to their SNPs. However, since the number of samples (number of pigs sampled) is usually much smaller than the number of features (SNPs) to classify, there may occur an overfitting problem. Therefore, some feature selection techniques were applied to the entire SNPs to reduce them to a much smaller number of most significant SNPs to be used in the classification. In this study, we used information gain in combination with binary flower pollination algorithm for feature selection as well as a cut-off-point-finding threshold for specifying a 0 or 1 value for a position in the solution vector and a GA bit-flip mutation operator. We called it Modified-BFPA. The classifier was SVM. Evaluated against a few other feature selection techniques, our combination of techniques was, at the very least, competitive to those. It selected only 1.76 % of most significant SNPs from the entire set of 10,210 SNPs. The SNPs that it selected provided 95.12 % classification accuracy. Moreover, it was fast: an average of 1.60 iterations in combination with SVM to find a set of best SNPs that provided the highest classification accuracy.","PeriodicalId":6598,"journal":{"name":"2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"71 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2019.8929963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Single nucleotide polymorphism (SNP) is a genetic trait responsible for the differences in the characteristics of individuals of a living species. Machine learning has been brought in to classify swine breed according to their SNPs. However, since the number of samples (number of pigs sampled) is usually much smaller than the number of features (SNPs) to classify, there may occur an overfitting problem. Therefore, some feature selection techniques were applied to the entire SNPs to reduce them to a much smaller number of most significant SNPs to be used in the classification. In this study, we used information gain in combination with binary flower pollination algorithm for feature selection as well as a cut-off-point-finding threshold for specifying a 0 or 1 value for a position in the solution vector and a GA bit-flip mutation operator. We called it Modified-BFPA. The classifier was SVM. Evaluated against a few other feature selection techniques, our combination of techniques was, at the very least, competitive to those. It selected only 1.76 % of most significant SNPs from the entire set of 10,210 SNPs. The SNPs that it selected provided 95.12 % classification accuracy. Moreover, it was fast: an average of 1.60 iterations in combination with SVM to find a set of best SNPs that provided the highest classification accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种改进的二元花授粉算法:快速有效地结合特征选择技术进行SNP分类
单核苷酸多态性(SNP)是一种遗传性状,负责一个活物种的个体特征的差异。机器学习已经被引入,根据它们的snp对猪品种进行分类。然而,由于样本数量(猪的样本数量)通常比需要分类的特征数量(snp)要小得多,因此可能会出现过拟合问题。因此,将一些特征选择技术应用于整个snp,以将其减少到数量更少的最重要的snp以用于分类。在这项研究中,我们将信息增益与二元授粉算法相结合用于特征选择,以及用于指定解向量中某个位置的0或1值的截断点查找阈值和GA位翻转突变算子。我们称之为改良bfpa。分类器为SVM。与其他一些特征选择技术相比,我们的技术组合至少是有竞争力的。它只从10,210个snp中选择了1.76%的最显著snp。它选择的snp提供95.12%的分类准确率。此外,它的速度很快:与SVM结合平均1.60次迭代即可找到一组提供最高分类精度的最佳snp。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Design and Simulation of Three Phase Squirrel Cage Induction Motor in Low Voltage System 48V 50Hz 3Hp for Electric Golf Cart Study on Detection Mechanism of HF Radar for Early Tsunami Detection and Comparison to Other Tsunami Sensors Research On The Impact of Knowledge Management Practice for Ogranizational Performance: Indonesian Electronic Power Company A Virtual Spring Damper Method for Formation Control of the Multi Omni-directional Robots in Cooperative Transportation Power Allocation for Group LDS-OFDM in Underlay Cognitive Radio
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1