Communication-efficient k-Means for Edge-based Machine Learning

Hanlin Lu, T. He, Shiqiang Wang, Changchang Liu, M. Mahdavi, V. Narayanan, Kevin S. Chan, Stephen Pasteris
{"title":"Communication-efficient k-Means for Edge-based Machine Learning","authors":"Hanlin Lu, T. He, Shiqiang Wang, Changchang Liu, M. Mahdavi, V. Narayanan, Kevin S. Chan, Stephen Pasteris","doi":"10.1109/ICDCS47774.2020.00062","DOIUrl":null,"url":null,"abstract":"We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR) and cardinality reduction (CR), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on state-of-the-art DR/CR methods, we show that: (i) in the single-source case, it is possible to achieve a near-optimal approximation at a near-linear complexity and a constant communication cost, (ii) in the multiple-source case, it is possible to achieve similar performance at a logarithmic communication cost, and (iii) the order of applying DR and CR significantly affects the complexity and the communication cost. Our findings are validated through experiments based on real datasets.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR) and cardinality reduction (CR), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on state-of-the-art DR/CR methods, we show that: (i) in the single-source case, it is possible to achieve a near-optimal approximation at a near-linear complexity and a constant communication cost, (ii) in the multiple-source case, it is possible to achieve similar performance at a logarithmic communication cost, and (iii) the order of applying DR and CR significantly affects the complexity and the communication cost. Our findings are validated through experiments based on real datasets.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于边缘的机器学习的高效通信k均值
我们考虑在基于边缘的机器学习背景下计算大型高维数据集的k-均值中心的问题,其中数据源将机器学习计算卸载到附近的边缘服务器。k-Means计算是许多数据分析的基础,通过利用边缘服务器的计算能力,以较低的数据源通信和计算成本计算可证明准确的k-Means中心的能力,将大大提高这些分析的性能。我们建议让数据源发送由联合维数约简(DR)和基数约简(CR)生成的小摘要,以降低复杂性和通信成本来支持近似k-means计算。通过分析基于最先进DR/CR方法的k-means算法的复杂度、通信成本和近似误差,我们发现:(i)在单源情况下,有可能以近似线性的复杂度和恒定的通信成本实现近似最优逼近;(ii)在多源情况下,有可能以对数的通信成本实现类似的性能;(iii)应用DR和CR的顺序显著影响复杂性和通信成本。我们的发现通过基于真实数据集的实验得到了验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Energy-Efficient Edge Offloading Scheme for UAV-Assisted Internet of Things Kill Two Birds with One Stone: Auto-tuning RocksDB for High Bandwidth and Low Latency BlueFi: Physical-layer Cross-Technology Communication from Bluetooth to WiFi [Title page i] Distributionally Robust Edge Learning with Dirichlet Process Prior
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1