Inductive and Effective Privacy-preserving Semi-supervised Learning with Harmonic Anchor Mixture

2021 International Symposium on Electrical, Electronics and Information Engineering Pub Date : 2021-02-19 DOI:10.1145/3459104.3459187

Zhi Li, Zhoujun Li

{"title":"Inductive and Effective Privacy-preserving Semi-supervised Learning with Harmonic Anchor Mixture","authors":"Zhi Li, Zhoujun Li","doi":"10.1145/3459104.3459187","DOIUrl":null,"url":null,"abstract":"Distributed privacy-preserving data mining (DPPDM) has been attracting enormous attention. It allows multiple participants to jointly use their datasets as a whole to train a model while preserving data privacy. Many works have been looking into the semi-supervised learning in DPPDM, to combine both labeled and unlabeled data for better performance. However, these works only provide transductive solutions, which means they can only give predictions for instances in the training set, and not for any new data sample beyond the set. Meanwhile, these methods are constructed with approximate calculations for security concerns, leading to sub-optimal results and limited effectiveness. In this paper, a mixture-model-based solution is proposed for inductive and effective semi-supervised learning in DPPDM. Our motivation lies in combining mixture models and graph-based methods to construct an anchor mixture with the ability of label prediction. We also propose an optimization process, which is accurately calculated through secure computation protocols, to achieve effectiveness. Experiments on synthetic and real-world datasets demonstrate that our proposal outperforms state-of-the-art methods in both transductive and inductive tasks.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Distributed privacy-preserving data mining (DPPDM) has been attracting enormous attention. It allows multiple participants to jointly use their datasets as a whole to train a model while preserving data privacy. Many works have been looking into the semi-supervised learning in DPPDM, to combine both labeled and unlabeled data for better performance. However, these works only provide transductive solutions, which means they can only give predictions for instances in the training set, and not for any new data sample beyond the set. Meanwhile, these methods are constructed with approximate calculations for security concerns, leading to sub-optimal results and limited effectiveness. In this paper, a mixture-model-based solution is proposed for inductive and effective semi-supervised learning in DPPDM. Our motivation lies in combining mixture models and graph-based methods to construct an anchor mixture with the ability of label prediction. We also propose an optimization process, which is accurately calculated through secure computation protocols, to achieve effectiveness. Experiments on synthetic and real-world datasets demonstrate that our proposal outperforms state-of-the-art methods in both transductive and inductive tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

调和锚混合的归纳有效保隐私半监督学习

分布式隐私保护数据挖掘(DPPDM)已经引起了广泛的关注。它允许多个参与者共同使用他们的数据集作为一个整体来训练模型，同时保护数据隐私。许多工作都在研究DPPDM中的半监督学习，将标记数据和未标记数据结合起来以获得更好的性能。然而，这些工作只提供了可转换的解决方案，这意味着它们只能对训练集中的实例进行预测，而不能对集之外的任何新数据样本进行预测。同时，这些方法都是基于安全考虑的近似计算来构建的，导致了次优结果和有限的有效性。本文提出了一种基于混合模型的DPPDM半监督学习方法。我们的动机是将混合模型与基于图的方法相结合，构建具有标签预测能力的锚点混合模型。我们还提出了一个优化过程，通过安全计算协议精确计算，以达到有效性。在合成和现实世界数据集上的实验表明，我们的建议在传导和归纳任务中都优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Symposium on Electrical, Electronics and Information Engineering

自引率

0.00%

发文量