Data distribution inference attack in federated learning via reinforcement learning support

IF 3.2 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS High-Confidence Computing Pub Date : 2024-05-17 DOI:10.1016/j.hcc.2024.100235

Dongxiao Yu , Hengming Zhang , Yan Huang , Zhenzhen Xie

{"title":"Data distribution inference attack in federated learning via reinforcement learning support","authors":"Dongxiao Yu , Hengming Zhang , Yan Huang , Zhenzhen Xie","doi":"10.1016/j.hcc.2024.100235","DOIUrl":null,"url":null,"abstract":"<div><div>Federated Learning (FL) is currently a widely used collaborative learning framework, and the distinguished feature of FL is that the clients involved in training do not need to share raw data, but only transfer the model parameters to share knowledge, and finally get a global model with improved performance. However, recent studies have found that sharing model parameters may still lead to privacy leakage. From the shared model parameters, local training data can be reconstructed and thus lead to a threat to individual privacy and security. We observed that most of the current attacks are aimed at client-specific data reconstruction, while limited attention is paid to the information leakage of the global model. In our work, we propose a novel FL attack based on shared model parameters that can deduce the data distribution of the global model. Different from other FL attacks that aim to infer individual clients’ raw data, the data distribution inference attack proposed in this work shows that the attackers can have the capability to deduce the data distribution information behind the global model. We argue that such information is valuable since the training data behind a well-trained global model indicates the common knowledge of a specific task, such as social networks and e-commerce applications. To implement such an attack, our key idea is to adopt a deep reinforcement learning approach to guide the attack process, where the RL agent adjusts the pseudo-data distribution automatically until it is similar to the ground truth data distribution. By a carefully designed Markov decision proces (MDP) process, our implementation ensures our attack can have stable performance and experimental results verify the effectiveness of our proposed inference attack.</div></div>","PeriodicalId":100605,"journal":{"name":"High-Confidence Computing","volume":"5 1","pages":"Article 100235"},"PeriodicalIF":3.2000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High-Confidence Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667295224000382","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Federated Learning (FL) is currently a widely used collaborative learning framework, and the distinguished feature of FL is that the clients involved in training do not need to share raw data, but only transfer the model parameters to share knowledge, and finally get a global model with improved performance. However, recent studies have found that sharing model parameters may still lead to privacy leakage. From the shared model parameters, local training data can be reconstructed and thus lead to a threat to individual privacy and security. We observed that most of the current attacks are aimed at client-specific data reconstruction, while limited attention is paid to the information leakage of the global model. In our work, we propose a novel FL attack based on shared model parameters that can deduce the data distribution of the global model. Different from other FL attacks that aim to infer individual clients’ raw data, the data distribution inference attack proposed in this work shows that the attackers can have the capability to deduce the data distribution information behind the global model. We argue that such information is valuable since the training data behind a well-trained global model indicates the common knowledge of a specific task, such as social networks and e-commerce applications. To implement such an attack, our key idea is to adopt a deep reinforcement learning approach to guide the attack process, where the RL agent adjusts the pseudo-data distribution automatically until it is similar to the ground truth data distribution. By a carefully designed Markov decision proces (MDP) process, our implementation ensures our attack can have stable performance and experimental results verify the effectiveness of our proposed inference attack.

查看原文