{"title":"Species Identification Using Partial DNA Sequence: A Machine Learning Approach","authors":"Tasnim Kabir, Abida Sanjana Shemonti, A. Rahman","doi":"10.1109/BIBE.2018.00052","DOIUrl":null,"url":null,"abstract":"Species identification with partial DNA sequences has proved effective for different organisms. DNA barcode is a short genetic marker in an organism's DNA to identify which species it belongs to. In this work, we analyze the effectiveness of supervised machine learning methods to classify species with DNA barcode. We choose specimens from phylogenetically diverse species belonging to the animal, plant and fungus kingdoms. We consider the supervised machine learning methods, simple logistic function, random forest, PART, instance-based k-nearest neighbor, attribute-based classifier, and bagging. The analysis of results on various datasets shows that the classification performances of the selected methods are encouraging, and has an accuracy of 93.66% on average. This result shows 6% improvement compared to the state-of-art DNA barcode classification methods, which have 88.37% accuracy on average.","PeriodicalId":127507,"journal":{"name":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2018.00052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Species identification with partial DNA sequences has proved effective for different organisms. DNA barcode is a short genetic marker in an organism's DNA to identify which species it belongs to. In this work, we analyze the effectiveness of supervised machine learning methods to classify species with DNA barcode. We choose specimens from phylogenetically diverse species belonging to the animal, plant and fungus kingdoms. We consider the supervised machine learning methods, simple logistic function, random forest, PART, instance-based k-nearest neighbor, attribute-based classifier, and bagging. The analysis of results on various datasets shows that the classification performances of the selected methods are encouraging, and has an accuracy of 93.66% on average. This result shows 6% improvement compared to the state-of-art DNA barcode classification methods, which have 88.37% accuracy on average.