2022 41st International Symposium on Reliable Distributed Systems (SRDS)最新文献

英文中文

Soter: Deep Learning Enhanced In-Network Attack Detection Based on Programmable Switches Soter:基于可编程交换机的深度学习增强网络攻击检测

2022 41st International Symposium on Reliable Distributed Systems (SRDS)

Pub Date : 2022-09-01 DOI: 10.1109/SRDS55811.2022.00029

Guorui Xie, Qing Li, Chupeng Cui, Peican Zhu, Dan Zhao, Wanxin Shi, Zhuyun Qi, Yong Jiang, Xianni Xiao

Though several deep learning (DL) detectors have been proposed for the network attack detection and achieved high accuracy, they are computationally expensive and struggle to satisfy the real-time detection for high-speed networks. Recently, programmable switches exhibit a remarkable throughput efficiency on production networks, indicating a possible deployment of the timely detector. Therefore, we present Soter, a DL enhanced in-network framework for the accurate real-time detection. Soter consists of two phases. One is filtering packets by a rule-based decision tree running on the Tofino ASIC. The other is executing a well-designed lightweight neural network for the thorough inspection of the suspicious packets on the CPU. Experiments on the commodity switch demonstrate that Soter behaves stably in ten network scenarios of different traffic rates and fulfills per-flow detection in 0.03s. Moreover, Soter naturally adapts to the distributed deployment among multiple switches, guaranteeing a higher total throughput for large data centers and cloud networks.

虽然已有几种深度学习检测器用于网络攻击检测，并取得了较高的准确率，但它们计算量大，难以满足高速网络的实时检测。最近，可编程交换机在生产网络上表现出显着的吞吐量效率，表明可能部署及时检测器。因此，我们提出了Soter，一个用于精确实时检测的DL增强网络框架。Soter由两个阶段组成。一种是通过运行在Tofino ASIC上的基于规则的决策树来过滤数据包。另一个是执行一个精心设计的轻量级神经网络，以彻底检查CPU上的可疑数据包。在商品交换机上的实验表明，Soter在10种不同流量速率的网络场景下表现稳定，并在0.03s内完成每流检测。此外，Soter自然适应多台交换机之间的分布式部署，保证了大型数据中心和云网络的更高总吞吐量。

{"title":"Soter: Deep Learning Enhanced In-Network Attack Detection Based on Programmable Switches","authors":"Guorui Xie, Qing Li, Chupeng Cui, Peican Zhu, Dan Zhao, Wanxin Shi, Zhuyun Qi, Yong Jiang, Xianni Xiao","doi":"10.1109/SRDS55811.2022.00029","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00029","url":null,"abstract":"Though several deep learning (DL) detectors have been proposed for the network attack detection and achieved high accuracy, they are computationally expensive and struggle to satisfy the real-time detection for high-speed networks. Recently, programmable switches exhibit a remarkable throughput efficiency on production networks, indicating a possible deployment of the timely detector. Therefore, we present Soter, a DL enhanced in-network framework for the accurate real-time detection. Soter consists of two phases. One is filtering packets by a rule-based decision tree running on the Tofino ASIC. The other is executing a well-designed lightweight neural network for the thorough inspection of the suspicious packets on the CPU. Experiments on the commodity switch demonstrate that Soter behaves stably in ten network scenarios of different traffic rates and fulfills per-flow detection in 0.03s. Moreover, Soter naturally adapts to the distributed deployment among multiple switches, guaranteeing a higher total throughput for large data centers and cloud networks.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"82 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128167777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Detection and Incentive: A Tampering Detection Mechanism for Object Detection in Edge Computing 检测与激励:一种边缘计算中目标检测的篡改检测机制

2022 41st International Symposium on Reliable Distributed Systems (SRDS)

Pub Date : 2022-09-01 DOI: 10.1109/SRDS55811.2022.00024

Zhihui Zhao, Yicheng Zeng, Jinfang Wang, Hong Li, Hongsong Zhu, Limin Sun

The object detection tasks based on edge computing have received great attention. A common concern hasn't been addressed is that edge may be unreliable and uploads the incorrect data to cloud. Existing works focus on the consistency of the transmitted data by edge. However, in cases when the inputs and the outputs are inherently different, the authenticity of data processing has not been addressed. In this paper, we first simply model the tampering detection. Then, bases on the feature insertion and game theory, the tampering detection and economic incentives mechanism (TDEI) is proposed. In tampering detection, terminal negotiates a set of features with cloud and inserts them into the raw data, after the cloud determines whether the results from edge contain the relevant information. The honesty incentives employs game theory to instill the distrust among different edges, preventing them from colluding and thwarting the tampering detection. Meanwhile, the subjectivity of nodes is also considered. TDEI distributes the tampering detection to all edges and realizes the self-detection of edge results. Experimental results based on the KITTI dataset, show that the accuracy of detection is 95% and 80%, when terminal's additional overhead is smaller than 30% for image and 20% for video, respectively. The interference ratios of TDEI to raw data are about 16% for video and 0% for image, respectively. Finally, we discuss the advantage and scalability of TDEI.

基于边缘计算的目标检测任务受到了广泛的关注。一个普遍的担忧尚未得到解决，即边缘可能不可靠，并将不正确的数据上传到云。现有的工作主要集中在边缘传输数据的一致性上。然而，在输入和输出本质上不同的情况下，数据处理的真实性没有得到解决。本文首先对篡改检测进行了简单的建模。然后，基于特征插入和博弈论，提出了篡改检测和经济激励机制(TDEI)。在篡改检测中，终端与云协商一组特征，并将其插入到原始数据中，云判断来自边缘的结果是否包含相关信息。诚信激励利用博弈论在不同边缘之间灌输不信任，防止它们相互勾结，挫败篡改检测。同时，还考虑了节点的主观性。TDEI将篡改检测分布到所有边缘，实现了边缘结果的自检测。基于KITTI数据集的实验结果表明，当终端的附加开销对图像和视频分别小于30%和20%时，检测准确率分别为95%和80%。TDEI对原始数据的干扰率，视频约为16%，图像约为0%。最后，讨论了TDEI的优势和可扩展性。

{"title":"Detection and Incentive: A Tampering Detection Mechanism for Object Detection in Edge Computing","authors":"Zhihui Zhao, Yicheng Zeng, Jinfang Wang, Hong Li, Hongsong Zhu, Limin Sun","doi":"10.1109/SRDS55811.2022.00024","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00024","url":null,"abstract":"The object detection tasks based on edge computing have received great attention. A common concern hasn't been addressed is that edge may be unreliable and uploads the incorrect data to cloud. Existing works focus on the consistency of the transmitted data by edge. However, in cases when the inputs and the outputs are inherently different, the authenticity of data processing has not been addressed. In this paper, we first simply model the tampering detection. Then, bases on the feature insertion and game theory, the tampering detection and economic incentives mechanism (TDEI) is proposed. In tampering detection, terminal negotiates a set of features with cloud and inserts them into the raw data, after the cloud determines whether the results from edge contain the relevant information. The honesty incentives employs game theory to instill the distrust among different edges, preventing them from colluding and thwarting the tampering detection. Meanwhile, the subjectivity of nodes is also considered. TDEI distributes the tampering detection to all edges and realizes the self-detection of edge results. Experimental results based on the KITTI dataset, show that the accuracy of detection is 95% and 80%, when terminal's additional overhead is smaller than 30% for image and 20% for video, respectively. The interference ratios of TDEI to raw data are about 16% for video and 0% for image, respectively. Finally, we discuss the advantage and scalability of TDEI.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131245736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

G-SINC: Global Synchronization Infrastructure for Network Clocks G-SINC:网络时钟的全球同步基础设施

2022 41st International Symposium on Reliable Distributed Systems (SRDS)

Pub Date : 2022-07-13 DOI: 10.1109/SRDS55811.2022.00021

Marc Frei, Jonghoon Kwon, Seyedali Tabaeiaghdaei, Marc Wyss, C. Lenzen, A. Perrig

Many critical computing applications rely on secure and dependable time which is reliably synchronized across large distributed systems. Today's time synchronization architectures are commonly based on global navigation satellite systems at the considerable risk of being exposed to outages, malfunction, or attacks against availability and accuracy. This paper describes a practical instantiation of a new global, Byzantine fault-tolerant clock synchronization approach that does not place trust in any single entity and is able to tolerate a fraction of faulty entities while still maintaining synchronization on a global scale among otherwise sovereign network topologies. Leveraging strong resilience and security properties provided by the path-aware SCION networking architecture, the presented design can be implemented as a backward compatible active standby solution for existing time synchronization deployments. Through extensive evaluation, we demonstrate that over 94 % of time servers reliably minimize the offset of their local clocks to real-time in the presence of up to 20 % malicious nodes, and all time servers remain synchronized with a skew of only 2 ms even after one year of reference clock outage.

许多关键的计算应用程序依赖于安全可靠的时间，这些时间在大型分布式系统之间可靠地同步。今天的时间同步体系结构通常基于全球导航卫星系统，面临停机、故障或针对可用性和准确性的攻击的相当大的风险。本文描述了一种新的全局拜占庭容错时钟同步方法的实际实例，该方法不信任任何单个实体，并且能够容忍一小部分故障实体，同时仍然在全局范围内保持主权网络拓扑之间的同步。利用路径感知的SCION网络架构提供的强大弹性和安全属性，本文提出的设计可以作为向后兼容的活动备用解决方案来实现，用于现有的时间同步部署。通过广泛的评估，我们证明了超过94%的时间服务器在存在高达20%的恶意节点的情况下可靠地将其本地时钟的偏移量降至实时，并且即使在参考时钟中断一年后，所有时间服务器仍以仅2毫秒的偏差保持同步。

{"title":"G-SINC: Global Synchronization Infrastructure for Network Clocks","authors":"Marc Frei, Jonghoon Kwon, Seyedali Tabaeiaghdaei, Marc Wyss, C. Lenzen, A. Perrig","doi":"10.1109/SRDS55811.2022.00021","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00021","url":null,"abstract":"Many critical computing applications rely on secure and dependable time which is reliably synchronized across large distributed systems. Today's time synchronization architectures are commonly based on global navigation satellite systems at the considerable risk of being exposed to outages, malfunction, or attacks against availability and accuracy. This paper describes a practical instantiation of a new global, Byzantine fault-tolerant clock synchronization approach that does not place trust in any single entity and is able to tolerate a fraction of faulty entities while still maintaining synchronization on a global scale among otherwise sovereign network topologies. Leveraging strong resilience and security properties provided by the path-aware SCION networking architecture, the presented design can be implemented as a backward compatible active standby solution for existing time synchronization deployments. Through extensive evaluation, we demonstrate that over 94 % of time servers reliably minimize the offset of their local clocks to real-time in the presence of up to 20 % malicious nodes, and all time servers remain synchronized with a skew of only 2 ms even after one year of reference clock outage.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121317260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Babel: A Framework for Developing Performant and Dependable Distributed Protocols Babel:一个开发高性能和可靠分布式协议的框架

2022 41st International Symposium on Reliable Distributed Systems (SRDS)

Pub Date : 2022-05-04 DOI: 10.1109/SRDS55811.2022.00022

Pedro Fouto, P. Costa, Nuno M. Preguiça, J. Leitao

Prototyping and implementing distributed algorithms, particularly those that address challenges related with fault-tolerance and dependability, is a time consuming task. This is, in part, due to the need of addressing low level aspects such as management of communication channels, controlling timeouts or periodic tasks, and dealing with concurrency issues. This has a significant impact for researchers that want to build prototypes for conducting experimental evaluation; practitioners that want to compare different design alternatives/solutions; and even for practical teaching activities on distributed algorithms courses. In this paper we present Babel, a novel framework to develop, implement, and execute distributed protocols and systems. Babel promotes an event driven programming and execution model that simplifies the task of translating typical specifications or descriptions of algorithms into performant prototypes, while allowing the programmer to focus on the relevant challenges of these algorithms by transparently handling time consuming low level aspects. Furthermore, Babel provides, and allows the definition of, networking components that can capture different network capabilities (e.g., P2P, Client/Server, p-accrual Failure Detector), making the code mostly independent from the underlying communication aspects. Babel was built to be generic and can be used to implement a wide variety of different classes of distributed protocols. We conduct our experimental work with two relevant case studies, a Peer-to-Peer application and a State Machine Replication application, that show the generality and ease of use of Babel and present competitive performance when compared with significantly more complex implementations.

原型化和实现分布式算法，特别是那些处理与容错和可靠性相关挑战的算法，是一项耗时的任务。这在一定程度上是由于需要解决底层方面的问题，例如通信通道的管理、控制超时或周期性任务，以及处理并发性问题。这对想要建立原型进行实验评估的研究人员有重大影响;想要比较不同设计方案/解决方案的从业者;甚至是分布式算法课程的实际教学活动。在本文中，我们提出了Babel，一个用于开发、实现和执行分布式协议和系统的新框架。Babel提倡一种事件驱动的编程和执行模型，它简化了将算法的典型规范或描述转换为高性能原型的任务，同时允许程序员通过透明地处理耗时的低级方面来关注这些算法的相关挑战。此外，Babel提供并允许定义能够捕获不同网络功能(例如，P2P、Client/Server、p-accrual Failure Detector)的网络组件，使代码基本上独立于底层通信方面。Babel是通用的，可用于实现各种不同类型的分布式协议。我们通过两个相关的案例研究(一个点对点应用程序和一个状态机复制应用程序)进行了实验工作，这些案例研究显示了Babel的通用性和易用性，并且与更复杂的实现相比，呈现出具有竞争力的性能。

{"title":"Babel: A Framework for Developing Performant and Dependable Distributed Protocols","authors":"Pedro Fouto, P. Costa, Nuno M. Preguiça, J. Leitao","doi":"10.1109/SRDS55811.2022.00022","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00022","url":null,"abstract":"Prototyping and implementing distributed algorithms, particularly those that address challenges related with fault-tolerance and dependability, is a time consuming task. This is, in part, due to the need of addressing low level aspects such as management of communication channels, controlling timeouts or periodic tasks, and dealing with concurrency issues. This has a significant impact for researchers that want to build prototypes for conducting experimental evaluation; practitioners that want to compare different design alternatives/solutions; and even for practical teaching activities on distributed algorithms courses. In this paper we present Babel, a novel framework to develop, implement, and execute distributed protocols and systems. Babel promotes an event driven programming and execution model that simplifies the task of translating typical specifications or descriptions of algorithms into performant prototypes, while allowing the programmer to focus on the relevant challenges of these algorithms by transparently handling time consuming low level aspects. Furthermore, Babel provides, and allows the definition of, networking components that can capture different network capabilities (e.g., P2P, Client/Server, p-accrual Failure Detector), making the code mostly independent from the underlying communication aspects. Babel was built to be generic and can be used to implement a wide variety of different classes of distributed protocols. We conduct our experimental work with two relevant case studies, a Peer-to-Peer application and a State Machine Replication application, that show the generality and ease of use of Babel and present competitive performance when compared with significantly more complex implementations.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115045850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

AGIC: Approximate Gradient Inversion Attack on Federated Learning AGIC:联邦学习的近似梯度反演攻击

2022 41st International Symposium on Reliable Distributed Systems (SRDS)

Pub Date : 2022-04-28 DOI: 10.1109/SRDS55811.2022.00012

Jin Xu, Chi Hong, Jiyue Huang, L. Chen, Jérémie Decouchant

Federated learning is a private-by-design distributed learning paradigm where clients train local models on their own data before a central server aggregates their local updates to compute a global model. Depending on the aggregation method used, the local updates are either the gradients or the weights of local learning models, e.g., FedAvg aggregates model weights. Unfortunately, recent reconstruction attacks apply a gradient inversion optimization on the gradient update of a single mini-batch to reconstruct the private data used by clients during training. As the state-of-the-art reconstruction attacks solely focus on single update, realistic adversarial scenarios are over-looked, such as observation across multiple updates and updates trained from multiple mini-batches. A few studies consider a more challenging adversarial scenario where only model updates based on multiple mini-batches are observable, and resort to computationally expensive simulation to untangle the underlying samples for each local step. In this paper, we propose AGIC, a novel Approximate Gradient Inversion Attack that efficiently and effectively reconstructs images from both model or gradient updates, and across multiple epochs. In a nutshell, AGIC (i) approximates gradient updates of used training samples from model updates to avoid costly simulation procedures, (ii) leverages gradient/model updates collected from multiple epochs, and (iii) assigns increasing weights to layers with respect to the neural network structure for reconstruction quality. We extensively evaluate AGIC on three datasets, namely CIFAR-10, CIFAR-100 and ImageNet. Our results show that AGIC increases the peak signal-to-noise ratio (PSNR) by up to 50% compared to two representative state-of-the-art gradient inversion attacks. Furthermore, AGIC is faster than the state-of-the-art simulation-based attack, e.g., it is 5x faster when attacking FedAvg with 8 local steps in between model updates.

联邦学习是一种基于私有设计的分布式学习范式，在这种范式中，客户机在自己的数据上训练本地模型，然后中央服务器聚合它们的本地更新以计算全局模型。根据所使用的聚合方法，局部更新要么是梯度，要么是局部学习模型的权重，例如，fedag聚合模型权重。不幸的是，最近的重构攻击在单个小批的梯度更新上应用梯度反演优化来重构客户端在训练期间使用的私有数据。由于最先进的重建攻击只关注单个更新，因此忽略了现实的对抗场景，例如跨多个更新的观察和从多个小批量训练的更新。一些研究考虑了一个更具挑战性的对抗场景，其中只有基于多个小批量的模型更新是可观察的，并且借助于计算昂贵的模拟来为每个局部步骤解耦底层样本。在本文中，我们提出了一种新的近似梯度反演攻击AGIC，它可以从模型或梯度更新中高效地重建图像，并且跨越多个时代。简而言之，AGIC (i)从模型更新中近似使用训练样本的梯度更新，以避免昂贵的模拟程序，(ii)利用从多个时代收集的梯度/模型更新，以及(iii)相对于神经网络结构为重建质量分配增加的权重。我们在三个数据集(CIFAR-10、CIFAR-100和ImageNet)上广泛评估了AGIC。我们的研究结果表明，与两种代表性的最先进的梯度反转攻击相比，AGIC将峰值信噪比(PSNR)提高了50%。此外，AGIC比最先进的基于仿真的攻击要快，例如，在模型更新之间使用8个局部步骤攻击fedag时，它的速度要快5倍。

{"title":"AGIC: Approximate Gradient Inversion Attack on Federated Learning","authors":"Jin Xu, Chi Hong, Jiyue Huang, L. Chen, Jérémie Decouchant","doi":"10.1109/SRDS55811.2022.00012","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00012","url":null,"abstract":"Federated learning is a private-by-design distributed learning paradigm where clients train local models on their own data before a central server aggregates their local updates to compute a global model. Depending on the aggregation method used, the local updates are either the gradients or the weights of local learning models, e.g., FedAvg aggregates model weights. Unfortunately, recent reconstruction attacks apply a gradient inversion optimization on the gradient update of a single mini-batch to reconstruct the private data used by clients during training. As the state-of-the-art reconstruction attacks solely focus on single update, realistic adversarial scenarios are over-looked, such as observation across multiple updates and updates trained from multiple mini-batches. A few studies consider a more challenging adversarial scenario where only model updates based on multiple mini-batches are observable, and resort to computationally expensive simulation to untangle the underlying samples for each local step. In this paper, we propose AGIC, a novel Approximate Gradient Inversion Attack that efficiently and effectively reconstructs images from both model or gradient updates, and across multiple epochs. In a nutshell, AGIC (i) approximates gradient updates of used training samples from model updates to avoid costly simulation procedures, (ii) leverages gradient/model updates collected from multiple epochs, and (iii) assigns increasing weights to layers with respect to the neural network structure for reconstruction quality. We extensively evaluate AGIC on three datasets, namely CIFAR-10, CIFAR-100 and ImageNet. Our results show that AGIC increases the peak signal-to-noise ratio (PSNR) by up to 50% compared to two representative state-of-the-art gradient inversion attacks. Furthermore, AGIC is faster than the state-of-the-art simulation-based attack, e.g., it is 5x faster when attacking FedAvg with 8 local steps in between model updates.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123097867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

D-Cliques: Compensating for Data Heterogeneity with Topology in Decentralized Federated Learning D-Cliques:用拓扑补偿分散联邦学习中的数据异质性

2022 41st International Symposium on Reliable Distributed Systems (SRDS)

Pub Date : 2021-04-15 DOI: 10.1109/SRDS55811.2022.00011

A. Bellet, Anne-Marie Kermarrec, Erick Lavoie

The convergence speed of machine learning models trained with Federated Learning is significantly affected by heterogeneous data partitions, even more so in a fully decentralized setting without a central server. In this paper, we show that the impact of label distribution skew, an important type of data heterogeneity, can be significantly reduced by carefully designing the underlying communication topology. We present D-Cliques, a novel topology that reduces gradient bias by grouping nodes in sparsely interconnected cliques such that the label distribution in a clique is representative of the global label distribution. We also show how to adapt the updates of decentralized SGD to obtain unbiased gradients and implement an effective momentum with D-Cliques. Our extensive empirical evaluation on MNIST and CIFAR10 validates our design and demonstrates that our approach achieves similar convergence speed as a fully-connected topology, while providing a significant reduction in the number of edges and messages. In a 1000-node topology, D-Cliques require 98% less edges and 96% less total messages, with further possible gains using a small-world topology across cliques.

使用联邦学习训练的机器学习模型的收敛速度受到异构数据分区的显著影响，在没有中央服务器的完全分散设置中更是如此。在本文中，我们展示了标签分布倾斜的影响，一种重要的数据异构类型，可以通过仔细设计底层通信拓扑来显着减少。我们提出了D-Cliques，一种新颖的拓扑结构，通过将节点分组在稀疏互连的团中，使团中的标签分布代表全局标签分布，从而减少梯度偏差。我们还展示了如何调整分散SGD的更新以获得无偏梯度并实现D-Cliques的有效动量。我们对MNIST和CIFAR10的广泛经验评估验证了我们的设计，并证明我们的方法实现了与全连接拓扑相似的收敛速度，同时显著减少了边和消息的数量。在1000个节点的拓扑中，D-Cliques需要的边减少98%，总消息减少96%，使用跨cliques的小世界拓扑可以获得进一步的收益。

{"title":"D-Cliques: Compensating for Data Heterogeneity with Topology in Decentralized Federated Learning","authors":"A. Bellet, Anne-Marie Kermarrec, Erick Lavoie","doi":"10.1109/SRDS55811.2022.00011","DOIUrl":"https://doi.org/10.1109/SRDS55811.2022.00011","url":null,"abstract":"The convergence speed of machine learning models trained with Federated Learning is significantly affected by heterogeneous data partitions, even more so in a fully decentralized setting without a central server. In this paper, we show that the impact of label distribution skew, an important type of data heterogeneity, can be significantly reduced by carefully designing the underlying communication topology. We present D-Cliques, a novel topology that reduces gradient bias by grouping nodes in sparsely interconnected cliques such that the label distribution in a clique is representative of the global label distribution. We also show how to adapt the updates of decentralized SGD to obtain unbiased gradients and implement an effective momentum with D-Cliques. Our extensive empirical evaluation on MNIST and CIFAR10 validates our design and demonstrates that our approach achieves similar convergence speed as a fully-connected topology, while providing a significant reduction in the number of edges and messages. In a 1000-node topology, D-Cliques require 98% less edges and 96% less total messages, with further possible gains using a small-world topology across cliques.","PeriodicalId":143115,"journal":{"name":"2022 41st International Symposium on Reliable Distributed Systems (SRDS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123348373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 41st International Symposium on Reliable Distributed Systems (SRDS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀