Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures

Jihyun Ryoo, Meenakshi Arunachalam, R. Khanna, M. Kandemir
{"title":"Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures","authors":"Jihyun Ryoo, Meenakshi Arunachalam, R. Khanna, M. Kandemir","doi":"10.1109/ISQED.2018.8357279","DOIUrl":null,"url":null,"abstract":"Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x–3.48x (training) and 1.43x–9.41x (classification) on Xeon Phi series, compared to 0.05x–0.60x (training), 1.61x–6.32x (classification) achieved by the GPU version, both over the standard host-only system.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 19th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2018.8357279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x–3.48x (training) and 1.43x–9.41x (classification) on Xeon Phi series, compared to 0.05x–0.60x (training), 1.61x–6.32x (classification) achieved by the GPU version, both over the standard host-only system.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向吞吐量架构的高效K近邻算法实现
许多新兴的和特定领域的应用程序需要能够从离线训练集和在线用户交互中获取和增强新知识。这需要一个可以承载机器学习(ML)内核的底层计算平台。这反过来又需要在最先进的多核和多核上有效地实现常用的ML内核,以充当高性能加速器。受此启发,本文将重点放在一个这样的机器学习内核上,即K最近邻(KNN),并对其在两种基于加速器的系统上的行为进行了全面比较:NVIDIA GPU和Intel Xeon Phi(都是KNC和KNL架构)。更明确地说,我们讨论和实验评估各种优化,可以应用于GPU和Xeon Phi,以及特定于GPU或Xeon Phi的优化。此外,我们在这些候选加速器上实现了不同版本的KNN,并使用不同的输入收集了实验数据。我们的实验评估表明,通过使用通用和特定于加速器的优化,可以在Xeon Phi系列上实现0.49x - 3.48倍(训练)和1.43x - 9.41倍(分类)的平均速度提升,而GPU版本的平均速度提升为0.05x - 0.60倍(训练),1.61x - 6.32倍(分类),两者都是在标准的仅主机系统上实现的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Body-biasing assisted vmin optimization for 5nm-node multi-Vt FD-SOI 6T-SRAM PDA-HyPAR: Path-diversity-aware hybrid planar adaptive routing algorithm for 3D NoCs A loop structure optimization targeting high-level synthesis of fast number theoretic transform Hybrid-comp: A criticality-aware compressed last-level cache Low power latch based design with smart retiming
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1