PEARSON CORRELATION COEFFICIENT K-NEAREST NEIGHBOR OUTLIER CLASSIFICATION ON REAL-TIME DATASETS

D. Rajakumari, S. Karthika
{"title":"PEARSON CORRELATION COEFFICIENT K-NEAREST NEIGHBOR OUTLIER CLASSIFICATION ON REAL-TIME DATASETS","authors":"D. Rajakumari, S. Karthika","doi":"10.21917/ijsc.2020.0290","DOIUrl":null,"url":null,"abstract":"Detection and classification of data that do not meet the expected behavior (outliers) plays the major role in wide variety of applications such as military surveillance, intrusion detection in cyber security, fraud detection in on-line transactions. Nowadays, an accurate detection of outliers with high dimension is the major issue. The trade-off between the high-accuracy and low computational time is the major requirement in outlier prediction and classification. The presence of large size diverse features need the reduction mechanism prior to classification approach. To achieve this, the Distance-based Outlier Classification (DOC) is proposed in this paper. The proposed work utilizes the Pearson Correlation Coefficient (PCC) to measure the correlation between the data instances. The minimum instance learning through PCC estimation reduces the dimensionality. The proposed work is split up into two phases namely training and testing.  During the training process, the labeling of most frequent samples isolates them from the infrequent reduce the data size effectively. The testing phase employs the k-Nearest Neighborhood (k-NN) scheme to classify the frequent samples effectively. The dimensionality and the k-value are inversely proportional to each other. In proposed work, the selection of large value of k offers the significant reduction in dimensionality. The combination of PCC-based instance learning and the high value of k reduces the dimensionality and noise respectively. The comparative analysis between the proposed PCC-k-NN with the conventional algorithms such as Decision Tree, Naive Bayes, Instance-Based K-means (IBK), Triangular Boundary-based Classification (TBC) regarding sensitivity, specificity, accuracy, precision, and recall proves its effectiveness in OC. Besides, the experimental validation of proposed PCC-k-NN with the state-of art methods regarding the execution time assures trade-off between the low-time consumption and high-accuracy.","PeriodicalId":428598,"journal":{"name":"Programmable Device Circuits and Systems","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Programmable Device Circuits and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21917/ijsc.2020.0290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Detection and classification of data that do not meet the expected behavior (outliers) plays the major role in wide variety of applications such as military surveillance, intrusion detection in cyber security, fraud detection in on-line transactions. Nowadays, an accurate detection of outliers with high dimension is the major issue. The trade-off between the high-accuracy and low computational time is the major requirement in outlier prediction and classification. The presence of large size diverse features need the reduction mechanism prior to classification approach. To achieve this, the Distance-based Outlier Classification (DOC) is proposed in this paper. The proposed work utilizes the Pearson Correlation Coefficient (PCC) to measure the correlation between the data instances. The minimum instance learning through PCC estimation reduces the dimensionality. The proposed work is split up into two phases namely training and testing.  During the training process, the labeling of most frequent samples isolates them from the infrequent reduce the data size effectively. The testing phase employs the k-Nearest Neighborhood (k-NN) scheme to classify the frequent samples effectively. The dimensionality and the k-value are inversely proportional to each other. In proposed work, the selection of large value of k offers the significant reduction in dimensionality. The combination of PCC-based instance learning and the high value of k reduces the dimensionality and noise respectively. The comparative analysis between the proposed PCC-k-NN with the conventional algorithms such as Decision Tree, Naive Bayes, Instance-Based K-means (IBK), Triangular Boundary-based Classification (TBC) regarding sensitivity, specificity, accuracy, precision, and recall proves its effectiveness in OC. Besides, the experimental validation of proposed PCC-k-NN with the state-of art methods regarding the execution time assures trade-off between the low-time consumption and high-accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
实时数据集的Pearson相关系数k -最近邻离群值分类
对不符合预期行为的数据(异常值)进行检测和分类,在军事监视、网络安全中的入侵检测、在线交易中的欺诈检测等各种应用中发挥着重要作用。高维异常点的准确检测是当前的主要问题。在离群值预测和分类中,高精度和低计算时间之间的权衡是主要的要求。在分类方法之前,需要对大尺寸不同特征的存在进行约简机制。为此,本文提出了基于距离的离群值分类方法(DOC)。提出的工作利用Pearson相关系数(PCC)来衡量数据实例之间的相关性。通过PCC估计的最小实例学习降低了维数。建议的工作分为两个阶段,即培训和测试。在训练过程中,最频繁样本的标记将其与不频繁样本隔离开来,有效地减小了数据量。测试阶段采用k近邻算法(k-NN)对频繁样本进行有效分类。维数与k值成反比。在建议的工作中,选择较大的k值可以显著降低维数。基于pc的实例学习和k的高值相结合,分别降低了维数和噪声。将PCC-k-NN与决策树、朴素贝叶斯、基于实例的K-means (IBK)、基于三角边界的分类(TBC)等传统算法在灵敏度、特异度、准确度、精密度和召回率等方面进行对比分析,证明了PCC-k-NN在OC中的有效性。此外,采用最先进的执行时间方法对所提出的PCC-k-NN进行了实验验证,保证了低耗时和高准确率之间的权衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Centralized Parallel form of Pattern Matching Algorithm in Packet Inspection by Efficient Utilization of Secondary Memory in Network Processor FPGA Implementation of Braun’s Multiplier Using Spartan-3E, Virtex – 4, Virtex-5 and Virtex-6 Simulation and Hardware Analysis of Three Phase PWM Rectifier for Power Factor Correction PEARSON CORRELATION COEFFICIENT K-NEAREST NEIGHBOR OUTLIER CLASSIFICATION ON REAL-TIME DATASETS
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1