SMART-TV: a fast and scalable nearest neighbor based classifier for data mining

T. Abidin, W. Perrizo
{"title":"SMART-TV: a fast and scalable nearest neighbor based classifier for data mining","authors":"T. Abidin, W. Perrizo","doi":"10.1145/1141277.1141403","DOIUrl":null,"url":null,"abstract":"K-nearest neighbors (KNN) is the simplest method for classification. Given a set of objects in a multi-dimensional feature space, the method assigns a category to an unclassified object based on the plurality of category of the k-nearest neighbors. The closeness between objects is determined using a distance measure, e.g. Euclidian distance. Despite its simplicity, KNN also has some drawbacks: 1) it suffers from expensive computational cost in training when the training set contains millions of objects; 2) its classification time is linear to the size of the training set. The larger the training set, the longer it takes to search for the k-nearest neighbors. In this paper, we propose a new algorithm, called SMART-TV (Small Absolute difference of Total Variation), that approximates a set of potential candidates of nearest neighbors by examining the absolute difference of total variation between each data object in the training set and the unclassified object. Then, the k-nearest neighbors are searched from that candidate set. We empirically evaluate the performance of our algorithm on both real and synthetic datasets and find that SMART-TV is fast and scalable. The classification accuracy of SMART-TV is high and comparable to the accuracy of the traditional KNN algorithm.","PeriodicalId":269830,"journal":{"name":"Proceedings of the 2006 ACM symposium on Applied computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2006 ACM symposium on Applied computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1141277.1141403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36

Abstract

K-nearest neighbors (KNN) is the simplest method for classification. Given a set of objects in a multi-dimensional feature space, the method assigns a category to an unclassified object based on the plurality of category of the k-nearest neighbors. The closeness between objects is determined using a distance measure, e.g. Euclidian distance. Despite its simplicity, KNN also has some drawbacks: 1) it suffers from expensive computational cost in training when the training set contains millions of objects; 2) its classification time is linear to the size of the training set. The larger the training set, the longer it takes to search for the k-nearest neighbors. In this paper, we propose a new algorithm, called SMART-TV (Small Absolute difference of Total Variation), that approximates a set of potential candidates of nearest neighbors by examining the absolute difference of total variation between each data object in the training set and the unclassified object. Then, the k-nearest neighbors are searched from that candidate set. We empirically evaluate the performance of our algorithm on both real and synthetic datasets and find that SMART-TV is fast and scalable. The classification accuracy of SMART-TV is high and comparable to the accuracy of the traditional KNN algorithm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SMART-TV:一个快速和可扩展的基于最近邻的数据挖掘分类器
k近邻(KNN)是最简单的分类方法。该方法在多维特征空间中给定一组对象,根据其k近邻的类别个数为未分类对象分配类别。物体之间的距离是用距离度量来确定的,例如欧几里得距离。尽管简单,但KNN也有一些缺点:1)当训练集包含数百万个对象时,它的训练计算成本昂贵;2)其分类时间与训练集的大小成线性关系。训练集越大,搜索k个最近邻所需的时间就越长。在本文中,我们提出了一种新的算法,称为SMART-TV (Small Absolute difference of Total Variation),它通过检查训练集中每个数据对象与未分类对象之间的总变化的绝对差来近似一组最近邻的潜在候选对象。然后,从候选集合中搜索k个最近的邻居。我们对算法在真实数据集和合成数据集上的性能进行了实证评估,发现SMART-TV快速且可扩展。SMART-TV的分类精度高,可与传统KNN算法的分类精度相媲美。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
File system framework for organizing sensor networks Editorial message: special track on operating systems and adaptive applications Simplifying transformation of software architecture constraints Session details: Software engineering: sound solutions for the 21st century To infinity and beyond or, avoiding the infinite in security protocol analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1