Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification

IF 0.5 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Journal of ICT Research and Applications Pub Date : 2018-09-28 DOI:10.5614/ITBJ.ICT.RES.APPL.2018.12.2.2
A. Pekuwali, W. Kusuma, A. Buono
{"title":"Optimization of Spaced K-mer Frequency Feature Extraction using Genetic Algorithms for Metagenome Fragment Classification","authors":"A. Pekuwali, W. Kusuma, A. Buono","doi":"10.5614/ITBJ.ICT.RES.APPL.2018.12.2.2","DOIUrl":null,"url":null,"abstract":"K -mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k -mers. These were obtained by generating the possible combinations of match positions and don’t care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k -mers could reduce the size of the k -mer frequency feature’s dimension. To measure the accuracy of the proposed method we used the naive Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k -mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k -mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.","PeriodicalId":42785,"journal":{"name":"Journal of ICT Research and Applications","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2018-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ICT Research and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5614/ITBJ.ICT.RES.APPL.2018.12.2.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 3

Abstract

K -mer frequencies are commonly used in extracting features from metagenome fragments. In spite of this, researchers have found that their use is still inefficient. In this research, a genetic algorithm was employed to find optimally spaced k -mers. These were obtained by generating the possible combinations of match positions and don’t care positions (written as *). This approach was adopted from the concept of spaced seeds in PatternHunter. The use of spaced k -mers could reduce the size of the k -mer frequency feature’s dimension. To measure the accuracy of the proposed method we used the naive Bayesian classifier (NBC). The result showed that the chromosome 111111110001, representing spaced k -mer model [111 1111 10001], was the best chromosome, with a higher fitness (85.42) than that of the k -mer frequency feature. Moreover, the proposed approach also reduced the feature extraction time.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于遗传算法的元基因组片段分类空间K-mer频率特征提取优化
K-mer频率通常用于从宏基因组片段中提取特征。尽管如此,研究人员发现它们的使用仍然效率低下。在这项研究中,采用遗传算法来寻找最佳间隔的k-mers。这些是通过生成匹配位置和不在乎位置的可能组合(写为*)获得的。这种方法是从PatternHunter中间隔种子的概念中采用的。使用间隔的k-mer可以减小k-mer频率特征的尺寸。为了测量所提出方法的准确性,我们使用了朴素贝叶斯分类器(NBC)。结果表明,代表间隔k-mer模型[111 1111 10001]的染色体1111 1111 0001是最好的染色体,其适应度(85.42)高于k-mer频率特征。此外,该方法还减少了特征提取时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of ICT Research and Applications
Journal of ICT Research and Applications COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
1.60
自引率
0.00%
发文量
13
审稿时长
24 weeks
期刊介绍: Journal of ICT Research and Applications welcomes full research articles in the area of Information and Communication Technology from the following subject areas: Information Theory, Signal Processing, Electronics, Computer Network, Telecommunication, Wireless & Mobile Computing, Internet Technology, Multimedia, Software Engineering, Computer Science, Information System and Knowledge Management. Authors are invited to submit articles that have not been published previously and are not under consideration elsewhere.
期刊最新文献
Smart Card-based Access Control System using Isolated Many-to-Many Authentication Scheme for Electric Vehicle Charging Stations The Evaluation of DyHATR Performance for Dynamic Heterogeneous Graphs Machine Learning-based Early Detection and Prognosis of the Covid-19 Pandemic Improving Robustness Using MixUp and CutMix Augmentation for Corn Leaf Diseases Classification based on ConvMixer Architecture Generative Adversarial Networks Based Scene Generation on Indian Driving Dataset
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1