SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2023-05-01 DOI:10.1109/IPDPS54959.2023.00088

Yifeng Tang, Cho-Li Wang

{"title":"SelB-k-NN: A Mini-Batch K-Nearest Neighbors Algorithm on AI Processors","authors":"Yifeng Tang, Cho-Li Wang","doi":"10.1109/IPDPS54959.2023.00088","DOIUrl":null,"url":null,"abstract":"The popularity of Artificial Intelligence (AI) motivates novel domain-specific hardware named AI processors. With a design trade-off, the AI processors feature incredible computation power for matrix multiplications and activations, while some leave other operations less powerful, e.g., scalar operations and vectorized comparisons & selections. For k-nearest neighbors (k-NN) algorithm, consisting of distance computation phase and k-selection phase, while the former is naturally accelerated, the previous efficient k-selection becomes problematic. Moreover, limited memory forces k-NN to adopt a mini-batch manner with tiling technique. As the distance computation’s results are the k-selection’s inputs, the former’s tiling shape determines that of the latter. Since the two phases execute on separate hardware units requiring different performance analyses, whether the former’s tiling strategies benefit the latter and entire k-NN is doubtful.To address the new challenges brought by the AI processors, this paper proposes SelB-k-NN (Selection-Bitonic-k-NN), a mini-batch algorithm inspired by selection sort and bitonic k-selection. SelB-k-NN avoids the expansion of the weakly-supported operations on the huge scale of datasets. To apply SelB-k-NN to various AI processors, we propose two algorithms to reduce the hardware support requirements. Since the matrix multiplication operates data with the specifically-designed memory hierarchy which k-selection does not share, the tiling shape of the former cannot guarantee the best execution of the latter and vice versa. By quantifying the runtime workload variations of k-selection, we formulate an optimization problem to search for the optimal tiling shapes of both phases with an offline pruning method, which reduces the search space in the preprocessing stage. Evaluations show that on Huawei Ascend 310 AI processor, SelB-k-NN achieves 2.01× speedup of the bitonic k-selection, 23.93× of the heap approach, 78.52× of the CPU approach. For mini-batch SelB-k-NN, the optimal tiling shapes for two phases respectively achieve 1.48× acceleration compared with the matrix multiplication tiling shapes and 1.14× with the k-selection tiling shapes, with 72.80% of the search space pruned.","PeriodicalId":343684,"journal":{"name":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS54959.2023.00088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The popularity of Artificial Intelligence (AI) motivates novel domain-specific hardware named AI processors. With a design trade-off, the AI processors feature incredible computation power for matrix multiplications and activations, while some leave other operations less powerful, e.g., scalar operations and vectorized comparisons & selections. For k-nearest neighbors (k-NN) algorithm, consisting of distance computation phase and k-selection phase, while the former is naturally accelerated, the previous efficient k-selection becomes problematic. Moreover, limited memory forces k-NN to adopt a mini-batch manner with tiling technique. As the distance computation’s results are the k-selection’s inputs, the former’s tiling shape determines that of the latter. Since the two phases execute on separate hardware units requiring different performance analyses, whether the former’s tiling strategies benefit the latter and entire k-NN is doubtful.To address the new challenges brought by the AI processors, this paper proposes SelB-k-NN (Selection-Bitonic-k-NN), a mini-batch algorithm inspired by selection sort and bitonic k-selection. SelB-k-NN avoids the expansion of the weakly-supported operations on the huge scale of datasets. To apply SelB-k-NN to various AI processors, we propose two algorithms to reduce the hardware support requirements. Since the matrix multiplication operates data with the specifically-designed memory hierarchy which k-selection does not share, the tiling shape of the former cannot guarantee the best execution of the latter and vice versa. By quantifying the runtime workload variations of k-selection, we formulate an optimization problem to search for the optimal tiling shapes of both phases with an offline pruning method, which reduces the search space in the preprocessing stage. Evaluations show that on Huawei Ascend 310 AI processor, SelB-k-NN achieves 2.01× speedup of the bitonic k-selection, 23.93× of the heap approach, 78.52× of the CPU approach. For mini-batch SelB-k-NN, the optimal tiling shapes for two phases respectively achieve 1.48× acceleration compared with the matrix multiplication tiling shapes and 1.14× with the k-selection tiling shapes, with 72.80% of the search space pruned.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SelB-k-NN: AI处理器上的一种小批k近邻算法

人工智能(AI)的普及激发了新的领域专用硬件——AI处理器。通过设计权衡，AI处理器在矩阵乘法和激活方面具有令人难以置信的计算能力，而有些操作则使其他操作不那么强大，例如标量操作和矢量化比较和选择。对于由距离计算阶段和k-选择阶段组成的k-近邻(k-NN)算法来说，距离计算阶段的速度自然会加快，而k-近邻算法之前高效的k-选择就成了问题。此外，有限的内存迫使k-NN采用带有平铺技术的小批量方式。由于距离计算的结果是k-selection的输入，因此前者的平铺形状决定了后者的平铺形状。由于这两个阶段在不同的硬件单元上执行，需要不同的性能分析，因此前者的平铺策略是否有利于后者和整个k-NN是值得怀疑的。为了解决人工智能处理器带来的新挑战，本文提出了SelB-k-NN (selection - bitonic -k- nn)，这是一种受选择排序和bitonic k-selection启发的小型批处理算法。SelB-k-NN避免了弱支持操作在大规模数据集上的扩展。为了将SelB-k-NN应用于各种人工智能处理器，我们提出了两种算法来降低硬件支持要求。由于矩阵乘法使用k-selection不共享的专门设计的内存层次结构来操作数据，因此前者的平铺形状不能保证后者的最佳执行，反之亦然。通过量化k-selection的运行时工作量变化，我们制定了一个优化问题，利用离线剪枝方法搜索两个阶段的最优平铺形状，从而减少了预处理阶段的搜索空间。评估表明，在华为Ascend 310 AI处理器上，SelB-k-NN的速度比bitonic k-selection提高2.01倍，比heap方法提高23.93倍，比CPU方法提高78.52倍。对于小批量SelB-k-NN，两阶段的最优平铺形状分别比矩阵乘法平铺形状加速1.48倍，比k-选择平铺形状加速1.14倍，72.80%的搜索空间被修剪。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量