Learning Privacy-Preserving Embeddings for Image Data to Be Published

IF 6.6 4区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE ACM Transactions on Intelligent Systems and Technology Pub Date : 2023-09-08 DOI:10.1145/3623404

Chu-Chen Li, Cheng-Te Li, Shou-De Lin

{"title":"Learning Privacy-Preserving Embeddings for Image Data to Be Published","authors":"Chu-Chen Li, Cheng-Te Li, Shou-De Lin","doi":"10.1145/3623404","DOIUrl":null,"url":null,"abstract":"Deep learning has its superiority on learning feature representations that bring promising performance in various application domains. Recent advances have shown that privacy attributes of users and patients (e.g., identity, gender and race) can be accurately inferred from image data. To avoid the risk of privacy leaking, data owners can resort to release the embeddings, rather than the original images. In this paper, we aim at learning to generate privacy-preserving embeddings from image data. The obtained embeddings are required to maintain the data utility (e.g., keep the performance of the main task like disease prediction), and to simultaneously prevent the private attributes of data instances from being accurately inferred. In addition, we also want the embeddings hard to be successfully used to reconstruct the original images. We propose a hybrid method based on multi-task learning to reach the goal. The key idea is two-fold. One is to learn the feature encoder that can benefit the main task and fool the sensitive task at the same time via iterative training and feature disentanglement. The other is to incorporate the learning of adversarial examples to mislead the sensitive attribute classification’s performance. Experiments conducted on Multi-Attribute Facial Landmark (MAFL) and NIH Chest X-rays datasets exhibit the effectiveness of our hybrid method. A set of advanced studies also show the usefulness of each model component, the difficulty in data reconstruction, and the performance impact of task correlation.","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":" ","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3623404","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning has its superiority on learning feature representations that bring promising performance in various application domains. Recent advances have shown that privacy attributes of users and patients (e.g., identity, gender and race) can be accurately inferred from image data. To avoid the risk of privacy leaking, data owners can resort to release the embeddings, rather than the original images. In this paper, we aim at learning to generate privacy-preserving embeddings from image data. The obtained embeddings are required to maintain the data utility (e.g., keep the performance of the main task like disease prediction), and to simultaneously prevent the private attributes of data instances from being accurately inferred. In addition, we also want the embeddings hard to be successfully used to reconstruct the original images. We propose a hybrid method based on multi-task learning to reach the goal. The key idea is two-fold. One is to learn the feature encoder that can benefit the main task and fool the sensitive task at the same time via iterative training and feature disentanglement. The other is to incorporate the learning of adversarial examples to mislead the sensitive attribute classification’s performance. Experiments conducted on Multi-Attribute Facial Landmark (MAFL) and NIH Chest X-rays datasets exhibit the effectiveness of our hybrid method. A set of advanced studies also show the usefulness of each model component, the difficulty in data reconstruction, and the performance impact of task correlation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

为即将发布的图像数据学习隐私保护嵌入

深度学习在学习特征表示方面具有优势，在各个应用领域都有很好的表现。最近的进展表明，用户和患者的隐私属性（如身份、性别和种族）可以从图像数据中准确推断出来。为了避免隐私泄露的风险，数据所有者可以发布嵌入内容，而不是原始图像。在本文中，我们的目标是学习从图像数据中生成保护隐私的嵌入。所获得的嵌入需要保持数据效用（例如，保持主要任务（如疾病预测）的性能），并同时防止数据实例的私有属性被准确推断。此外，我们还希望嵌入难以成功地用于重建原始图像。我们提出了一种基于多任务学习的混合方法来达到目标。关键思想有两个方面。一种是通过迭代训练和特征解纠缠，学习既有利于主任务又有利于敏感任务的特征编码器。另一种是结合对抗性例子的学习来误导敏感属性分类的性能。在多属性面部标志（MAFL）和美国国立卫生研究院胸部X射线数据集上进行的实验表明了我们的混合方法的有效性。一组高级研究还显示了每个模型组件的有用性、数据重建的难度以及任务相关性对性能的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.30

自引率

2.00%

发文量

131

期刊介绍： ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.