Communication-efficient k-Means for Edge-based Machine Learning

2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2020-11-01 DOI:10.1109/ICDCS47774.2020.00062

Hanlin Lu, T. He, Shiqiang Wang, Changchang Liu, M. Mahdavi, V. Narayanan, Kevin S. Chan, Stephen Pasteris

{"title":"Communication-efficient k-Means for Edge-based Machine Learning","authors":"Hanlin Lu, T. He, Shiqiang Wang, Changchang Liu, M. Mahdavi, V. Narayanan, Kevin S. Chan, Stephen Pasteris","doi":"10.1109/ICDCS47774.2020.00062","DOIUrl":null,"url":null,"abstract":"We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR) and cardinality reduction (CR), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on state-of-the-art DR/CR methods, we show that: (i) in the single-source case, it is possible to achieve a near-optimal approximation at a near-linear complexity and a constant communication cost, (ii) in the multiple-source case, it is possible to achieve similar performance at a logarithmic communication cost, and (iii) the order of applying DR and CR significantly affects the complexity and the communication cost. Our findings are validated through experiments based on real datasets.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR) and cardinality reduction (CR), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on state-of-the-art DR/CR methods, we show that: (i) in the single-source case, it is possible to achieve a near-optimal approximation at a near-linear complexity and a constant communication cost, (ii) in the multiple-source case, it is possible to achieve similar performance at a logarithmic communication cost, and (iii) the order of applying DR and CR significantly affects the complexity and the communication cost. Our findings are validated through experiments based on real datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于边缘的机器学习的高效通信k均值

我们考虑在基于边缘的机器学习背景下计算大型高维数据集的k-均值中心的问题，其中数据源将机器学习计算卸载到附近的边缘服务器。k-Means计算是许多数据分析的基础，通过利用边缘服务器的计算能力，以较低的数据源通信和计算成本计算可证明准确的k-Means中心的能力，将大大提高这些分析的性能。我们建议让数据源发送由联合维数约简(DR)和基数约简(CR)生成的小摘要，以降低复杂性和通信成本来支持近似k-means计算。通过分析基于最先进DR/CR方法的k-means算法的复杂度、通信成本和近似误差，我们发现:(i)在单源情况下，有可能以近似线性的复杂度和恒定的通信成本实现近似最优逼近;(ii)在多源情况下，有可能以对数的通信成本实现类似的性能;(iii)应用DR和CR的顺序显著影响复杂性和通信成本。我们的发现通过基于真实数据集的实验得到了验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量

期刊最新文献

An Energy-Efficient Edge Offloading Scheme for UAV-Assisted Internet of Things Kill Two Birds with One Stone: Auto-tuning RocksDB for High Bandwidth and Low Latency BlueFi: Physical-layer Cross-Technology Communication from Bluetooth to WiFi [Title page i] Distributionally Robust Edge Learning with Dirichlet Process Prior