The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Accuracy of Machine Learning

Hadeel Obaid, Saad Ahmed Dheyab, Sana Sabah Sabry
{"title":"The Impact of Data Pre-Processing Techniques and Dimensionality Reduction on the Accuracy of Machine Learning","authors":"Hadeel Obaid, Saad Ahmed Dheyab, Sana Sabah Sabry","doi":"10.1109/IEMECONX.2019.8877011","DOIUrl":null,"url":null,"abstract":"Data pre-processing is considered as the core stage in machine learning and data mining. Normalization, discretization, and dimensionality reduction are well-known techniques in data pre-processing. This research paper seeks to examine the effects of Min-max, Z-score, Decimal Scaling, and Logarithm to the base 2 on the accuracy of J48 classifier using the NSL-KDD dataset. Experiments were conducted using the above-listed methods and their individual results were compared to each other. Principal component analysis (PCA) and Linear Discriminant Analysis (LDA) were tested for dimensionality reduction; furthermore, a hybrid combination of PCA and LDA was attempted and the performance showed an improved classification accuracy compared to the individual methods.","PeriodicalId":358845,"journal":{"name":"2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEMECONX.2019.8877011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

Abstract

Data pre-processing is considered as the core stage in machine learning and data mining. Normalization, discretization, and dimensionality reduction are well-known techniques in data pre-processing. This research paper seeks to examine the effects of Min-max, Z-score, Decimal Scaling, and Logarithm to the base 2 on the accuracy of J48 classifier using the NSL-KDD dataset. Experiments were conducted using the above-listed methods and their individual results were compared to each other. Principal component analysis (PCA) and Linear Discriminant Analysis (LDA) were tested for dimensionality reduction; furthermore, a hybrid combination of PCA and LDA was attempted and the performance showed an improved classification accuracy compared to the individual methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
数据预处理技术和降维对机器学习精度的影响
数据预处理被认为是机器学习和数据挖掘的核心阶段。归一化、离散化和降维是众所周知的数据预处理技术。本研究论文试图使用NSL-KDD数据集检查Min-max, Z-score, Decimal Scaling和log to the base 2对J48分类器精度的影响。采用上述方法进行了实验,并对各自的结果进行了比较。主成分分析(PCA)和线性判别分析(LDA)进行降维检验;此外,我们还尝试了PCA和LDA的混合组合,结果表明,与单独的方法相比,PCA和LDA的分类精度有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Conceptual Framework of a Prototype Data Driven Decision Support System for Farmland Health Assessment using Wireless Sensor Network Evaluation of Multi-access Edge Computing Deployment Scenarios 3D Path planning of fixed and mobile environments using potential field algorithm with Genetic algorithm Eye Center Guided Constrained Local Model for Landmark Localization in Facial Image Optimal time-jerk-torque trajectory planning of industrial robot under kinematic and dynamic constraints
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1