Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning最新文献

英文中文

Secure aggregation for federated learning in flower 花中联邦学习的安全聚合

Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning

Pub Date : 2021-12-07 DOI: 10.1145/3488659.3493776

Kwing Hei Li, P. P. B. D. Gusmão, Daniel J. Beutel, N. Lane

Federated Learning (FL) allows parties to learn a shared prediction model by delegating the training computation to clients and aggregating all the separately trained models on the server. To prevent private information being inferred from local models, Secure Aggregation (SA) protocols are used to ensure that the server is unable to inspect individual trained models as it aggregates them. However, current implementations of SA in FL frameworks have limitations, including vulnerability to client dropouts or configuration difficulties. In this paper, we present Salvia, an implementation of SA for Python users in the Flower FL framework. Based on the SecAgg(+) protocols for a semi-honest threat model, Salvia is robust against client dropouts and exposes a flexible and easy-to-use API that is compatible with various machine learning frameworks. We show that Salvia's experimental performance is consistent with SecAgg(+)'s theoretical computation and communication complexities.

联邦学习(FL)允许各方通过将训练计算委托给客户机并在服务器上聚合所有单独训练的模型来学习共享的预测模型。为了防止从本地模型推断出私有信息，使用安全聚合(SA)协议来确保服务器在聚合单个训练模型时无法检查它们。然而，目前在FL框架中SA的实现存在局限性，包括客户端退出的脆弱性或配置困难。在本文中，我们介绍了Salvia，它是在Flower FL框架中为Python用户提供的SA实现。基于SecAgg(+)协议的半诚实威胁模型，Salvia对客户端退出具有强大的抵抗力，并提供灵活且易于使用的API，可与各种机器学习框架兼容。我们发现，Salvia的实验性能与SecAgg(+)的理论计算和通信复杂性是一致的。

引用次数: 17

Image reconstruction attacks on distributed machine learning models 分布式机器学习模型的图像重建攻击

Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning

Pub Date : 2021-12-07 DOI: 10.1145/3488659.3493779

Hadjer Benkraouda, K. Nahrstedt

Recent developments in Deep Neural Networks have resulted in their wide deployment for services around many aspects of human life, including security critical domains that handle sensitive data. Congruently, we have seen a proliferation of IoT devices with limited resources. Together, these two trends have led to the distribution of data analysis, processing, and decision making between edge devices and third parties such as cloud services. In this work we assess the security of the previously proposed distributed machine learning (ML) schemes by analyzing the information leaked from the output of the edge devices, i.e. the intermediate representation (IR). We particularly look at a Deep Neural Network that is used for video/image classification and tackle the problem of image/frame reconstruction from the output of the edge device. Our work focuses on assessing whether the proposed scheme of partitioned enclave execution is secure against chosen-image attacks (CIA). Given the attacker has the capability of querying the model under attack (victim model) to create image-IR pairs, can the attacker reconstruct the private input images? In this work we show that it is possible to carry out a black-box reconstruction attack by training a CNN based encoder-decoder architecture (reconstruction model) using image-IR pairs. Our tests show that the proposed reconstruction model achieves a 70% similarity between the original image and the reconstructed image.

深度神经网络的最新发展使其广泛应用于人类生活的许多方面，包括处理敏感数据的安全关键领域。与此同时，我们看到了资源有限的物联网设备的激增。这两种趋势共同导致了数据分析、处理和决策在边缘设备和第三方(如云服务)之间的分布。在这项工作中，我们通过分析从边缘设备的输出泄露的信息(即中间表示(IR))来评估先前提出的分布式机器学习(ML)方案的安全性。我们特别关注用于视频/图像分类的深度神经网络，并从边缘设备的输出处理图像/帧重建问题。我们的工作重点是评估所提出的分区飞地执行方案是否能够安全抵御选择映像攻击(CIA)。假设攻击者有能力查询被攻击的模型(受害者模型)来创建图像-红外对，攻击者能否重建私有输入图像?在这项工作中，我们证明了通过使用图像-红外对训练基于CNN的编码器-解码器架构(重建模型)来进行黑箱重建攻击是可能的。我们的测试表明，所提出的重建模型在原始图像和重建图像之间达到了70%的相似度。

{"title":"Image reconstruction attacks on distributed machine learning models","authors":"Hadjer Benkraouda, K. Nahrstedt","doi":"10.1145/3488659.3493779","DOIUrl":"https://doi.org/10.1145/3488659.3493779","url":null,"abstract":"Recent developments in Deep Neural Networks have resulted in their wide deployment for services around many aspects of human life, including security critical domains that handle sensitive data. Congruently, we have seen a proliferation of IoT devices with limited resources. Together, these two trends have led to the distribution of data analysis, processing, and decision making between edge devices and third parties such as cloud services. In this work we assess the security of the previously proposed distributed machine learning (ML) schemes by analyzing the information leaked from the output of the edge devices, i.e. the intermediate representation (IR). We particularly look at a Deep Neural Network that is used for video/image classification and tackle the problem of image/frame reconstruction from the output of the edge device. Our work focuses on assessing whether the proposed scheme of partitioned enclave execution is secure against chosen-image attacks (CIA). Given the attacker has the capability of querying the model under attack (victim model) to create image-IR pairs, can the attacker reconstruct the private input images? In this work we show that it is possible to carry out a black-box reconstruction attack by training a CNN based encoder-decoder architecture (reconstruction model) using image-IR pairs. Our tests show that the proposed reconstruction model achieves a 70% similarity between the original image and the reconstructed image.","PeriodicalId":343000,"journal":{"name":"Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123826908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

FL_PyTorch: optimization research simulator for federated learning FL_PyTorch:联邦学习的优化研究模拟器

Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning

Pub Date : 2021-12-07 DOI: 10.1145/3488659.3493775

Konstantin Burlachenko, Samuel Horváth, Peter Richtárik

Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared machine learning model while keeping training data locally on the device, thereby removing the need to store and access the full data in the cloud. However, FL is difficult to implement, test and deploy in practice considering heterogeneity in common edge device settings, making it fundamentally hard for researchers to efficiently prototype and test their optimization algorithms. In this work, our aim is to alleviate this problem by introducing FL_PyTorch : a suite of open-source software written in python that builds on top of one the most popular research Deep Learning (DL) framework PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with a sufficient level of flexibility to experiment with existing and novel approaches to advance the state-of-the-art. Furthermore, FL_PyTorch is a simple to use console system, allows to run several clients simultaneously using local CPUs or GPU(s), and even remote compute devices without the need for any distributed implementation provided by the user. FL_PyTorch also offers a Graphical User Interface. For new methods, researchers only provide the centralized implementation of their algorithm. To showcase the possibilities and usefulness of our system, we experiment with several well-known state-of-the-art FL algorithms and a few of the most common FL datasets.

联邦学习(FL)已经成为边缘设备协作学习共享机器学习模型的一种有前途的技术，同时将训练数据保存在设备本地，从而消除了在云中存储和访问完整数据的需要。然而，考虑到通用边缘设备设置的异质性，FL难以在实践中实现、测试和部署，这使得研究人员很难有效地原型化和测试他们的优化算法。在这项工作中，我们的目标是通过引入FL_PyTorch来缓解这个问题:FL_PyTorch是一套用python编写的开源软件，建立在最流行的研究深度学习(DL)框架PyTorch之上。我们构建了FL_PyTorch作为FL的研究模拟器，以实现快速开发，原型设计和实验新的和现有的FL优化算法。我们的系统支持抽象，为研究人员提供足够的灵活性，以实验现有的和新颖的方法来推进最先进的技术。此外，FL_PyTorch是一个简单易用的控制台系统，允许使用本地cpu或GPU同时运行多个客户端，甚至远程计算设备，而无需用户提供任何分布式实现。FL_PyTorch还提供了一个图形用户界面。对于新方法，研究人员只提供其算法的集中实现。为了展示我们的系统的可能性和有用性，我们用几个著名的最先进的FL算法和一些最常见的FL数据集进行了实验。

{"title":"FL_PyTorch: optimization research simulator for federated learning","authors":"Konstantin Burlachenko, Samuel Horváth, Peter Richtárik","doi":"10.1145/3488659.3493775","DOIUrl":"https://doi.org/10.1145/3488659.3493775","url":null,"abstract":"Federated Learning (FL) has emerged as a promising technique for edge devices to collaboratively learn a shared machine learning model while keeping training data locally on the device, thereby removing the need to store and access the full data in the cloud. However, FL is difficult to implement, test and deploy in practice considering heterogeneity in common edge device settings, making it fundamentally hard for researchers to efficiently prototype and test their optimization algorithms. In this work, our aim is to alleviate this problem by introducing FL_PyTorch : a suite of open-source software written in python that builds on top of one the most popular research Deep Learning (DL) framework PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with a sufficient level of flexibility to experiment with existing and novel approaches to advance the state-of-the-art. Furthermore, FL_PyTorch is a simple to use console system, allows to run several clients simultaneously using local CPUs or GPU(s), and even remote compute devices without the need for any distributed implementation provided by the user. FL_PyTorch also offers a Graphical User Interface. For new methods, researchers only provide the centralized implementation of their algorithm. To showcase the possibilities and usefulness of our system, we experiment with several well-known state-of-the-art FL algorithms and a few of the most common FL datasets.","PeriodicalId":343000,"journal":{"name":"Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129078076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Doing more by doing less: how structured partial backpropagation improves deep learning clusters 事半功倍:结构化部分反向传播如何改进深度学习集群

Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning

Pub Date : 2021-11-20 DOI: 10.1145/3488659.3493778

Adarsh Kumar, Kausik Subramanian, S. Venkataraman, Aditya Akella

Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models in a distributed fashion. Training is resource-intensive, consuming significant compute, memory, and network resources. Many prior works explore how to reduce training resource footprint without impacting quality, but their focus on a subset of the bottlenecks (typically only the network) limits their ability to improve overall cluster utilization. In this work, we exploit the unique characteristics of deep learning workloads to propose Structured Partial Backpropagation(SPB), a technique that systematically controls the amount of backpropagation at individual workers in distributed training. This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality. To efficiently leverage the benefits of SPB at cluster level, we introduce Jigsaw, a SPB aware scheduler, which does scheduling at the iteration level for Deep Learning Training(DLT) jobs. We find that Jigsaw can improve large scale cluster efficiency by as high as 28%.

许多组织使用配备gpu和tpu等加速器的计算集群，以分布式方式训练深度学习模型。培训是资源密集型的，需要消耗大量的计算、内存和网络资源。许多先前的工作探索了如何在不影响质量的情况下减少训练资源占用，但是他们对瓶颈子集(通常只有网络)的关注限制了他们提高整体集群利用率的能力。在这项工作中，我们利用深度学习工作负载的独特特征提出了结构化部分反向传播(SPB)，这是一种系统地控制分布式训练中单个工作人员的反向传播量的技术。这同时减少了网络带宽、计算利用率和内存占用，同时保持了模型质量。为了有效地利用SPB在集群级别的优势，我们引入了Jigsaw，一个SPB感知调度器，它在迭代级别为深度学习训练(DLT)作业进行调度。我们发现Jigsaw可以将大规模集群的效率提高高达28%。

{"title":"Doing more by doing less: how structured partial backpropagation improves deep learning clusters","authors":"Adarsh Kumar, Kausik Subramanian, S. Venkataraman, Aditya Akella","doi":"10.1145/3488659.3493778","DOIUrl":"https://doi.org/10.1145/3488659.3493778","url":null,"abstract":"Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models in a distributed fashion. Training is resource-intensive, consuming significant compute, memory, and network resources. Many prior works explore how to reduce training resource footprint without impacting quality, but their focus on a subset of the bottlenecks (typically only the network) limits their ability to improve overall cluster utilization. In this work, we exploit the unique characteristics of deep learning workloads to propose Structured Partial Backpropagation(SPB), a technique that systematically controls the amount of backpropagation at individual workers in distributed training. This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality. To efficiently leverage the benefits of SPB at cluster level, we introduce Jigsaw, a SPB aware scheduler, which does scheduling at the iteration level for Deep Learning Training(DLT) jobs. We find that Jigsaw can improve large scale cluster efficiency by as high as 28%.","PeriodicalId":343000,"journal":{"name":"Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131025052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Rapid IoT device identification at the edge 在边缘快速识别物联网设备

Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning

Pub Date : 2021-10-26 DOI: 10.1145/3488659.3493777

O. Thompson, A. Mandalari, H. Haddadi

Consumer Internet of Things (IoT) devices are increasingly common in everyday homes, from smart speakers to security cameras. Along with their benefits come potential privacy and security threats. To limit these threats we must implement solutions to filter IoT traffic at the edge. To this end the identification of the IoT device is the first natural step. In this paper we demonstrate a novel method of rapid IoT device identification that uses neural networks trained on device DNS traffic that can be captured from a DNS server on the local network. The method identifies devices by fitting a model to the first seconds of DNS second-level-domain traffic following their first connection. Since security and privacy threat detection often operate at a device specific level, rapid identification allows these strategies to be implemented immediately. Through a total of 51,000 rigorous automated experiments, we classify 30 consumer IoT devices from 27 different manufacturers with 82% and 93% accuracy for product type and device manufacturers respectively.

从智能扬声器到安全摄像头，消费者物联网(IoT)设备在日常家庭中越来越普遍。它们带来好处的同时，也带来了潜在的隐私和安全威胁。为了限制这些威胁，我们必须实施在边缘过滤物联网流量的解决方案。为此，物联网设备的识别是第一步。在本文中，我们展示了一种快速物联网设备识别的新方法，该方法使用可从本地网络上的DNS服务器捕获的设备DNS流量训练的神经网络。该方法通过将模型拟合到DNS二级域流量在首次连接后的第一秒来识别设备。由于安全和隐私威胁检测通常在设备特定级别上运行，因此快速识别允许立即实施这些策略。通过总共51,000个严格的自动化实验，我们对来自27个不同制造商的30个消费物联网设备进行了分类，产品类型和设备制造商的准确率分别为82%和93%。

引用次数: 3

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀