Influence of Relief Feature Selection on Random Forest and Support Vector Machine Classification Algorithm

Iustisia Natalia Simbolon, Romual Naibaho
{"title":"Influence of Relief Feature Selection on Random Forest and Support Vector Machine Classification Algorithm","authors":"Iustisia Natalia Simbolon, Romual Naibaho","doi":"10.1109/IWBIS56557.2022.9924782","DOIUrl":null,"url":null,"abstract":"The classification technique is one of the popular techniques used in helping humans decide the target class of a data based on machine learning principles. Unfortunately the construction of a classification model has no limits and will always evolve over time. There is no surefire way to make a perfect classification model, but there are ways that at least make the classification model better. This study applies the feature selection method to produce a more optimal classification model accuracy value. Of the many feature selection algorithms, this research chooses Relief which is combined with a classification algorithm, namely Random Forest and Support Vector Machine. This research also applies the Grid Search Optimization method in selecting the most influential features. In addition, it is also used to select the best hyperparameters to build the classification model. For splitting the data set, the K Fold Cross Validation technique is used in order to get the most optimal proportion of data splitting. Compared to the accuracy values before and after feature selection, both classification algorithms after feature selection significantly outperform the classification model before feature selection. It was also found that the model’s capabilities in the real world, through validation with new data, performed quite well.","PeriodicalId":348371,"journal":{"name":"2022 7th International Workshop on Big Data and Information Security (IWBIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Workshop on Big Data and Information Security (IWBIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWBIS56557.2022.9924782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The classification technique is one of the popular techniques used in helping humans decide the target class of a data based on machine learning principles. Unfortunately the construction of a classification model has no limits and will always evolve over time. There is no surefire way to make a perfect classification model, but there are ways that at least make the classification model better. This study applies the feature selection method to produce a more optimal classification model accuracy value. Of the many feature selection algorithms, this research chooses Relief which is combined with a classification algorithm, namely Random Forest and Support Vector Machine. This research also applies the Grid Search Optimization method in selecting the most influential features. In addition, it is also used to select the best hyperparameters to build the classification model. For splitting the data set, the K Fold Cross Validation technique is used in order to get the most optimal proportion of data splitting. Compared to the accuracy values before and after feature selection, both classification algorithms after feature selection significantly outperform the classification model before feature selection. It was also found that the model’s capabilities in the real world, through validation with new data, performed quite well.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
地形特征选择对随机森林和支持向量机分类算法的影响
分类技术是基于机器学习原理帮助人类确定数据目标类别的常用技术之一。不幸的是,分类模型的构建没有限制,并且总是会随着时间的推移而发展。没有万无一失的方法可以建立一个完美的分类模型,但至少有一些方法可以使分类模型变得更好。本研究采用特征选择方法产生更优的分类模型精度值。在众多的特征选择算法中,本研究选择了Relief,它结合了一种分类算法,即随机森林和支持向量机。本研究还应用网格搜索优化方法来选择最具影响力的特征。此外,它还用于选择最佳的超参数来构建分类模型。对于数据集的分割,为了得到最优的数据分割比例,使用了K Fold交叉验证技术。对比特征选择前后的准确率值,两种选择后的分类算法都明显优于特征选择前的分类模型。通过对新数据的验证,还发现该模型在现实世界中的性能表现相当好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
IWBIS 2022 Program Schedule Large-scale 3D Point Cloud Semantic Segmentation with 3D U-Net ASPP Sparse CNN Modeling Person’s Creditworthiness over Their Demography and Personality Appearance in Social Media A Secure Lightweight Authentication Scheme in IoT Environment with Perfect Forward and Backward Secrecy Modified MultiResUNet for Left Ventricle Segmentation from Echocardiographic Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1