{"title":"Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification","authors":"A. Pekuwali, W. Kusuma, A. Buono","doi":"10.5614/ITBJ.ICT.RES.APPL.2018.12.2.2","DOIUrl":null,"url":null,"abstract":"K -mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k -mers. These were obtained by generating the possible combinations of match positions and don’t care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k -mers could reduce the size of the k -mer frequency feature’s dimension. To measure the accuracy of the proposed method we used the naive Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k -mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k -mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.","PeriodicalId":42785,"journal":{"name":"Journal of ICT Research and Applications","volume":"6 5-6","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2018-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ICT Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5614/ITBJ.ICT.RES.APPL.2018.12.2.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 3
Abstract
K -mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k -mers. These were obtained by generating the possible combinations of match positions and don’t care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k -mers could reduce the size of the k -mer frequency feature’s dimension. To measure the accuracy of the proposed method we used the naive Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k -mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k -mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.
期刊介绍:
Journal of ICT Research and Applications welcomes full research articles in the area of Information and Communication Technology from the following subject areas: Information Theory, Signal Processing, Electronics, Computer Network, Telecommunication, Wireless & Mobile Computing, Internet Technology, Multimedia, Software Engineering, Computer Science, Information System and Knowledge Management. Authors are invited to submit articles that have not been published previously and are not under consideration elsewhere.