Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures

2018 19th International Symposium on Quality Electronic Design (ISQED) Pub Date : 2018-03-13 DOI:10.1109/ISQED.2018.8357279

Jihyun Ryoo, Meenakshi Arunachalam, R. Khanna, M. Kandemir

{"title":"Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures","authors":"Jihyun Ryoo, Meenakshi Arunachalam, R. Khanna, M. Kandemir","doi":"10.1109/ISQED.2018.8357279","DOIUrl":null,"url":null,"abstract":"Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x–3.48x (training) and 1.43x–9.41x (classification) on Xeon Phi series, compared to 0.05x–0.60x (training), 1.61x–6.32x (classification) achieved by the GPU version, both over the standard host-only system.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 19th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2018.8357279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x–3.48x (training) and 1.43x–9.41x (classification) on Xeon Phi series, compared to 0.05x–0.60x (training), 1.61x–6.32x (classification) achieved by the GPU version, both over the standard host-only system.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向吞吐量架构的高效K近邻算法实现

许多新兴的和特定领域的应用程序需要能够从离线训练集和在线用户交互中获取和增强新知识。这需要一个可以承载机器学习(ML)内核的底层计算平台。这反过来又需要在最先进的多核和多核上有效地实现常用的ML内核，以充当高性能加速器。受此启发，本文将重点放在一个这样的机器学习内核上，即K最近邻(KNN)，并对其在两种基于加速器的系统上的行为进行了全面比较:NVIDIA GPU和Intel Xeon Phi(都是KNC和KNL架构)。更明确地说，我们讨论和实验评估各种优化，可以应用于GPU和Xeon Phi，以及特定于GPU或Xeon Phi的优化。此外，我们在这些候选加速器上实现了不同版本的KNN，并使用不同的输入收集了实验数据。我们的实验评估表明，通过使用通用和特定于加速器的优化，可以在Xeon Phi系列上实现0.49x - 3.48倍(训练)和1.43x - 9.41倍(分类)的平均速度提升，而GPU版本的平均速度提升为0.05x - 0.60倍(训练)，1.61x - 6.32倍(分类)，两者都是在标准的仅主机系统上实现的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 19th International Symposium on Quality Electronic Design (ISQED)

自引率

0.00%

发文量

期刊最新文献

Body-biasing assisted vmin optimization for 5nm-node multi-Vt FD-SOI 6T-SRAM PDA-HyPAR: Path-diversity-aware hybrid planar adaptive routing algorithm for 3D NoCs A loop structure optimization targeting high-level synthesis of fast number theoretic transform Hybrid-comp: A criticality-aware compressed last-level cache Low power latch based design with smart retiming