Selim Ickin, K. Vandikas, Farnaz Moradi, Jalil Taghia, Wenfeng Hu
{"title":"联邦QoE建模中基于集成的综合数据综合","authors":"Selim Ickin, K. Vandikas, Farnaz Moradi, Jalil Taghia, Wenfeng Hu","doi":"10.1109/NetSoft48620.2020.9165379","DOIUrl":null,"url":null,"abstract":"Quality of Experience (QoE) models need good generalization that necessitates sufficient amount of user-labeled datasets associated with measurements related to underlying QoE factors. However, obtaining QoE datasets is often costly, since they are preferably collected from many subjects with diverse background, and eventually dataset sizes and representations are limited. Models can be improved by sharing and merging those collected local datasets, however regulations such as GDPR make data sharing difficult, as those local user datasets might contain sensitive information about the subjects. A privacy-preserving machine learning approach such as Federated Learning (FL) is a potential candidate that enables sharing of QoE data models between collaborators without exposing ground truth, but only by means of sharing the securely aggregated form of extracted model parameters. While FL can enable a seamless QoE model management, if collaborators do not have the same level of data quality, more iterations of information sharing over a communication channel might be necessary for models to reach an acceptable accuracy. In this paper, we present an ensemble based Bayesian synthetic data generation method for FL, LOO (Leave-One-Out), which reduces the training time by 30% and the network footprint in the communication channel by 60%.","PeriodicalId":239961,"journal":{"name":"2020 6th IEEE Conference on Network Softwarization (NetSoft)","volume":"22 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Ensemble-based Synthetic Data Synthesis for Federated QoE Modeling\",\"authors\":\"Selim Ickin, K. Vandikas, Farnaz Moradi, Jalil Taghia, Wenfeng Hu\",\"doi\":\"10.1109/NetSoft48620.2020.9165379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Quality of Experience (QoE) models need good generalization that necessitates sufficient amount of user-labeled datasets associated with measurements related to underlying QoE factors. However, obtaining QoE datasets is often costly, since they are preferably collected from many subjects with diverse background, and eventually dataset sizes and representations are limited. Models can be improved by sharing and merging those collected local datasets, however regulations such as GDPR make data sharing difficult, as those local user datasets might contain sensitive information about the subjects. A privacy-preserving machine learning approach such as Federated Learning (FL) is a potential candidate that enables sharing of QoE data models between collaborators without exposing ground truth, but only by means of sharing the securely aggregated form of extracted model parameters. While FL can enable a seamless QoE model management, if collaborators do not have the same level of data quality, more iterations of information sharing over a communication channel might be necessary for models to reach an acceptable accuracy. In this paper, we present an ensemble based Bayesian synthetic data generation method for FL, LOO (Leave-One-Out), which reduces the training time by 30% and the network footprint in the communication channel by 60%.\",\"PeriodicalId\":239961,\"journal\":{\"name\":\"2020 6th IEEE Conference on Network Softwarization (NetSoft)\",\"volume\":\"22 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 6th IEEE Conference on Network Softwarization (NetSoft)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NetSoft48620.2020.9165379\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th IEEE Conference on Network Softwarization (NetSoft)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NetSoft48620.2020.9165379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
摘要
体验质量(QoE)模型需要良好的泛化,这需要足够数量的用户标记数据集,这些数据集与与潜在QoE因素相关的测量相关联。然而,获得QoE数据集通常是昂贵的,因为它们最好是从具有不同背景的许多主题中收集的,并且最终数据集的大小和表示是有限的。可以通过共享和合并这些收集的本地数据集来改进模型,但是GDPR等法规使数据共享变得困难,因为这些本地用户数据集可能包含有关主题的敏感信息。联邦学习(FL)等保护隐私的机器学习方法是一种潜在的候选方法,它可以在协作者之间共享QoE数据模型,而不会暴露基本事实,但只能通过共享提取的模型参数的安全聚合形式来实现。虽然FL可以实现无缝的QoE模型管理,但如果协作者没有相同级别的数据质量,则可能需要通过通信通道进行更多的信息共享迭代,以使模型达到可接受的准确性。在本文中,我们提出了一种基于集成的FL, LOO (Leave-One-Out)贝叶斯合成数据生成方法,该方法将训练时间减少了30%,并将通信信道中的网络占用减少了60%。
Ensemble-based Synthetic Data Synthesis for Federated QoE Modeling
Quality of Experience (QoE) models need good generalization that necessitates sufficient amount of user-labeled datasets associated with measurements related to underlying QoE factors. However, obtaining QoE datasets is often costly, since they are preferably collected from many subjects with diverse background, and eventually dataset sizes and representations are limited. Models can be improved by sharing and merging those collected local datasets, however regulations such as GDPR make data sharing difficult, as those local user datasets might contain sensitive information about the subjects. A privacy-preserving machine learning approach such as Federated Learning (FL) is a potential candidate that enables sharing of QoE data models between collaborators without exposing ground truth, but only by means of sharing the securely aggregated form of extracted model parameters. While FL can enable a seamless QoE model management, if collaborators do not have the same level of data quality, more iterations of information sharing over a communication channel might be necessary for models to reach an acceptable accuracy. In this paper, we present an ensemble based Bayesian synthetic data generation method for FL, LOO (Leave-One-Out), which reduces the training time by 30% and the network footprint in the communication channel by 60%.