{"title":"Analysis of DNA Sequence Classification Using SVM Model with Hyperparameter Tuning Grid Search CV","authors":"Iis Setiawan Mangkunegara, P. Purwono","doi":"10.1109/CyberneticsCom55287.2022.9865624","DOIUrl":null,"url":null,"abstract":"Viruses and bacteria are constantly evolving in the world. Early identification of pathogens is one way that can be used to spread the spread of disease to drug design. DNA sequence classification is an essential aspect of computational biology. Pathogen identification was carried out by comparing data between sequenced genomes with NCBI data. Machine learning technology can classify DNA whose nature is unclear, and the sequence is considered long and challenging to find. The SVM classification model is proposed in this study. The resulting accuracy is still considered not optimal, so optimization is needed. In contrast to previous studies, we used the grid search cv optimization technique on the SVM classification model. Kernel polynomial with 2 degrees is the best parameter recommendation from the grid search cv technique. The accuracy before the optimization is 77%, while it is 90% after optimization. This shows an increase in accuracy of 14% after applying the grid search cv method to DNA sequence classification using the SVM model.","PeriodicalId":178279,"journal":{"name":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberneticsCom55287.2022.9865624","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Viruses and bacteria are constantly evolving in the world. Early identification of pathogens is one way that can be used to spread the spread of disease to drug design. DNA sequence classification is an essential aspect of computational biology. Pathogen identification was carried out by comparing data between sequenced genomes with NCBI data. Machine learning technology can classify DNA whose nature is unclear, and the sequence is considered long and challenging to find. The SVM classification model is proposed in this study. The resulting accuracy is still considered not optimal, so optimization is needed. In contrast to previous studies, we used the grid search cv optimization technique on the SVM classification model. Kernel polynomial with 2 degrees is the best parameter recommendation from the grid search cv technique. The accuracy before the optimization is 77%, while it is 90% after optimization. This shows an increase in accuracy of 14% after applying the grid search cv method to DNA sequence classification using the SVM model.