Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00170
Sebastián Villarroya, P. Baumann
Machine Learning is increasingly being applied to many different application domains. From cancer detection to weather forecast, a large number of different applications leverage machine learning algorithms to get faster and more accurate results over huge datasets. Although many of these datasets are mainly composed of array data, a vast majority of machine learning applications do not use array databases. This tutorial focuses on the integration of machine learning algorithms and array databases. By implementing machine learning algorithms in array databases users can boost the native efficient array data processing with machine learning methods to perform accurate and fast array data analytics.
{"title":"On the Integration of Machine Learning and Array Databases","authors":"Sebastián Villarroya, P. Baumann","doi":"10.1109/ICDE48307.2020.00170","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00170","url":null,"abstract":"Machine Learning is increasingly being applied to many different application domains. From cancer detection to weather forecast, a large number of different applications leverage machine learning algorithms to get faster and more accurate results over huge datasets. Although many of these datasets are mainly composed of array data, a vast majority of machine learning applications do not use array databases. This tutorial focuses on the integration of machine learning algorithms and array databases. By implementing machine learning algorithms in array databases users can boost the native efficient array data processing with machine learning methods to perform accurate and fast array data analytics.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"24 1","pages":"1786-1789"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74754436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00080
Khanh Luong, R. Nayak
Effective methods are required to be developed that can deal with the multi-faceted nature of the multi-view data. We design a factorization-based loss function-based method to simultaneously learn two components encoding the consensus and complementary information present in multi-view data by using the Coupled Matrix Factorization (CMF) and Non-negative Matrix Factorization (NMF). We propose a novel optimal manifold for multi-view data which is the most consensed manifold embedded in the high-dimensional multi-view data. A new complementary enhancing term is added in the loss function to enhance the complementary information inherent in each view. An extensive experiment with diverse datasets, benchmarking the state-of-the-art multi-view clustering methods, has demonstrated the effectiveness of the proposed method in obtaining accurate clustering solution.
{"title":"A Novel Approach to Learning Consensus and Complementary Information for Multi-View Data Clustering","authors":"Khanh Luong, R. Nayak","doi":"10.1109/ICDE48307.2020.00080","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00080","url":null,"abstract":"Effective methods are required to be developed that can deal with the multi-faceted nature of the multi-view data. We design a factorization-based loss function-based method to simultaneously learn two components encoding the consensus and complementary information present in multi-view data by using the Coupled Matrix Factorization (CMF) and Non-negative Matrix Factorization (NMF). We propose a novel optimal manifold for multi-view data which is the most consensed manifold embedded in the high-dimensional multi-view data. A new complementary enhancing term is added in the loss function to enhance the complementary information inherent in each view. An extensive experiment with diverse datasets, benchmarking the state-of-the-art multi-view clustering methods, has demonstrated the effectiveness of the proposed method in obtaining accurate clustering solution.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"6 1","pages":"865-876"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73425873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/icde48307.2020.00005
{"title":"Message from the ICDE 2020 Chairs","authors":"","doi":"10.1109/icde48307.2020.00005","DOIUrl":"https://doi.org/10.1109/icde48307.2020.00005","url":null,"abstract":"","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73778172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The explosion of time series advances the development of time series databases. To reduce storage overhead in these systems, data compression is widely adopted. Most existing compression algorithms utilize the overall characteristics of the entire time series to achieve high compression ratio, but ignore local contexts around individual points. In this way, they are effective for certain data patterns, and may suffer inherent pattern changes in real-world time series. It is therefore strongly desired to have a compression method that can always achieve high compression ratio in the existence of pattern diversity.In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity. Based on this model, we design and implement AMMMO framework, where a set of control parameters is defined to distill and categorize data patterns. At the top level, we evaluate each sub-sequence to fill in these parameters, generating a set of compression scheme candidates (i.e., major mode selection). At the bottom level, we choose the best scheme from these candidates for each data point respectively (i.e., sub-mode selection). To effectively handle diverse data patterns, we introduce a reinforcement learning based approach to learn parameter values automatically. Our experimental evaluation shows that our approach improves compression ratio by up to 120% (with an average of 50%), compared to other time-series compression methods.
{"title":"Two-Level Data Compression using Machine Learning in Time Series Database","authors":"Xinyang Yu, Yanqing Peng, Feifei Li, Sheng Wang, Xiaowei Shen, Huijun Mai, Yue Xie","doi":"10.1109/ICDE48307.2020.00119","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00119","url":null,"abstract":"The explosion of time series advances the development of time series databases. To reduce storage overhead in these systems, data compression is widely adopted. Most existing compression algorithms utilize the overall characteristics of the entire time series to achieve high compression ratio, but ignore local contexts around individual points. In this way, they are effective for certain data patterns, and may suffer inherent pattern changes in real-world time series. It is therefore strongly desired to have a compression method that can always achieve high compression ratio in the existence of pattern diversity.In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity. Based on this model, we design and implement AMMMO framework, where a set of control parameters is defined to distill and categorize data patterns. At the top level, we evaluate each sub-sequence to fill in these parameters, generating a set of compression scheme candidates (i.e., major mode selection). At the bottom level, we choose the best scheme from these candidates for each data point respectively (i.e., sub-mode selection). To effectively handle diverse data patterns, we introduce a reinforcement learning based approach to learn parameter values automatically. Our experimental evaluation shows that our approach improves compression ratio by up to 120% (with an average of 50%), compared to other time-series compression methods.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"8 1","pages":"1333-1344"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83698489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00031
Basit Khurram, F. Kerschbaum
The prevalence of various (and increasingly large) datasets presents the challenging problem of discovering common entities dispersed across disparate datasets. Solutions to the private record linkage problem (PRL) aim to enable such explorations of datasets in a secure manner. A two-party PRL protocol allows two parties to determine for which entities they each possess a record (either an exact matching record or a fuzzy matching record) in their respective datasets — without revealing to one another information about any entities for which they do not both possess records. Although several solutions have been proposed to solve the PRL problem, no current solution offers a fully cryptographic security guarantee while maintaining both high accuracy of output and subquadratic runtime efficiency. To this end, we propose the first known efficient PRL protocol that runs in subquadratic time, provides high accuracy, and guarantees cryptographic security in the semi-honest security model.
{"title":"SFour: A Protocol for Cryptographically Secure Record Linkage at Scale","authors":"Basit Khurram, F. Kerschbaum","doi":"10.1109/ICDE48307.2020.00031","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00031","url":null,"abstract":"The prevalence of various (and increasingly large) datasets presents the challenging problem of discovering common entities dispersed across disparate datasets. Solutions to the private record linkage problem (PRL) aim to enable such explorations of datasets in a secure manner. A two-party PRL protocol allows two parties to determine for which entities they each possess a record (either an exact matching record or a fuzzy matching record) in their respective datasets — without revealing to one another information about any entities for which they do not both possess records. Although several solutions have been proposed to solve the PRL problem, no current solution offers a fully cryptographic security guarantee while maintaining both high accuracy of output and subquadratic runtime efficiency. To this end, we propose the first known efficient PRL protocol that runs in subquadratic time, provides high accuracy, and guarantees cryptographic security in the semi-honest security model.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"83 1","pages":"277-288"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83065431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00043
Ruida Yang, Xin Lin, Jianliang Xu, Yan Yang, Liang He
Knowledge graphs have been used in a wide range of applications to support search, recommendation, and question answering (Q&A). For example, in Q&A systems, given a new question, we may use a knowledge graph to automatically identify the most suitable answers based on similarity evaluation. However, such systems may suffer from two major limitations. First, the knowledge graph constructed based on source data may contain errors. Second, the knowledge graph may become out of date and cannot quickly adapt to new knowledge. To address these issues, in this paper, we propose an interactive framework that refines and optimizes knowledge graphs through user votes. We develop an efficient similarity evaluation notion, called extended inverse P-distance, based on which the graph optimization problem can be formulated as a signomial geometric programming problem. We then propose a basic single-vote solution and a more advanced multi-vote solution for graph optimization. We also propose a split-and-merge optimization strategy to scale up the multi-vote solution. Extensive experiments based on real-life and synthetic graphs demonstrate the effectiveness and efficiency of our proposed framework.
{"title":"Optimizing Knowledge Graphs through Voting-based User Feedback","authors":"Ruida Yang, Xin Lin, Jianliang Xu, Yan Yang, Liang He","doi":"10.1109/ICDE48307.2020.00043","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00043","url":null,"abstract":"Knowledge graphs have been used in a wide range of applications to support search, recommendation, and question answering (Q&A). For example, in Q&A systems, given a new question, we may use a knowledge graph to automatically identify the most suitable answers based on similarity evaluation. However, such systems may suffer from two major limitations. First, the knowledge graph constructed based on source data may contain errors. Second, the knowledge graph may become out of date and cannot quickly adapt to new knowledge. To address these issues, in this paper, we propose an interactive framework that refines and optimizes knowledge graphs through user votes. We develop an efficient similarity evaluation notion, called extended inverse P-distance, based on which the graph optimization problem can be formulated as a signomial geometric programming problem. We then propose a basic single-vote solution and a more advanced multi-vote solution for graph optimization. We also propose a split-and-merge optimization strategy to scale up the multi-vote solution. Extensive experiments based on real-life and synthetic graphs demonstrate the effectiveness and efficiency of our proposed framework.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"421-432"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77809538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00140
A. Swami, Sriram Vasudevan, Joojay Huyn
Many organizations process big data for important business operations and decisions. Hence, data quality greatly affects their success. Data quality problems continue to be widespread, costing US businesses an estimated $600 billion annually. To date, addressing data quality in production environments still poses many challenges: easily defining properties of high-quality data; validating production-scale data in a timely manner; debugging poor quality data; designing data quality solutions to be easy to use, understand, and operate; and designing data quality solutions to easily integrate with other systems. Current data validation solutions do not comprehensively address these challenges. To address data quality in production environments at LinkedIn, we developed Data Sentinel, a declarative production-scale data validation platform. In a simple and well-structured configuration, users declaratively specify the desired data checks. Then, Data Sentinel performs these data checks and writes the results to an easily understandable report. Furthermore, Data Sentinel provides well-defined schemas for the configuration and report. This makes it easy for other systems to interface or integrate with Data Sentinel. To make Data Sentinel even easier to use, understand, and operate in production environments, we provide Data Sentinel Service (DSS), a complementary system to help specify data checks, schedule, deploy, and tune data validation jobs, and understand data checking results. The contributions of this paper include the following: 1) Data Sentinel, a declarative production-scale data validation platform successfully deployed at LinkedIn 2) A generic design to build and deploy similar systems for production environments 3) Experiences and lessons learned that can benefit practitioners with similar objectives.
许多组织处理大数据来进行重要的业务操作和决策。因此,数据质量极大地影响了它们的成功。数据质量问题仍然普遍存在,估计每年给美国企业造成6000亿美元的损失。迄今为止,在生产环境中解决数据质量问题仍然面临许多挑战:容易定义高质量数据的属性;及时验证生产规模数据;调试质量差的数据;设计易于使用、理解和操作的数据质量解决方案;并设计数据质量解决方案,以便与其他系统轻松集成。当前的数据验证解决方案不能全面解决这些挑战。为了解决LinkedIn生产环境中的数据质量问题,我们开发了data Sentinel,这是一个声明式的生产规模数据验证平台。在简单且结构良好的配置中,用户声明式地指定所需的数据检查。然后,Data Sentinel执行这些数据检查,并将结果写入一个易于理解的报告。此外,Data Sentinel为配置和报告提供了良好定义的模式。这使得其他系统很容易与Data Sentinel进行接口或集成。为了使Data Sentinel在生产环境中更容易使用、理解和操作,我们提供了Data Sentinel Service (DSS),这是一个辅助系统,可帮助指定数据检查、调度、部署和调优数据验证作业,并理解数据检查结果。本文的贡献包括以下内容:1)Data Sentinel,一个成功部署在LinkedIn上的声明式生产规模数据验证平台;2)为生产环境构建和部署类似系统的通用设计;3)经验和教训,可以使具有类似目标的从业者受益。
{"title":"Data Sentinel: A Declarative Production-Scale Data Validation Platform","authors":"A. Swami, Sriram Vasudevan, Joojay Huyn","doi":"10.1109/ICDE48307.2020.00140","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00140","url":null,"abstract":"Many organizations process big data for important business operations and decisions. Hence, data quality greatly affects their success. Data quality problems continue to be widespread, costing US businesses an estimated $600 billion annually. To date, addressing data quality in production environments still poses many challenges: easily defining properties of high-quality data; validating production-scale data in a timely manner; debugging poor quality data; designing data quality solutions to be easy to use, understand, and operate; and designing data quality solutions to easily integrate with other systems. Current data validation solutions do not comprehensively address these challenges. To address data quality in production environments at LinkedIn, we developed Data Sentinel, a declarative production-scale data validation platform. In a simple and well-structured configuration, users declaratively specify the desired data checks. Then, Data Sentinel performs these data checks and writes the results to an easily understandable report. Furthermore, Data Sentinel provides well-defined schemas for the configuration and report. This makes it easy for other systems to interface or integrate with Data Sentinel. To make Data Sentinel even easier to use, understand, and operate in production environments, we provide Data Sentinel Service (DSS), a complementary system to help specify data checks, schedule, deploy, and tune data validation jobs, and understand data checking results. The contributions of this paper include the following: 1) Data Sentinel, a declarative production-scale data validation platform successfully deployed at LinkedIn 2) A generic design to build and deploy similar systems for production environments 3) Experiences and lessons learned that can benefit practitioners with similar objectives.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"13 1","pages":"1579-1590"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83360330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00103
Mo Li, F. Choudhury, Renata Borovica-Gajic, Zhiqiong Wang, Junchang Xin, Jianxin Li
SimRank is a significant metric to measure the similarity of nodes in graph data analysis. The problem of SimRank computation has been studied extensively, however there is no existing work that can provide one unified algorithm to support the SimRank computation both on static and temporal graphs. In this work, we first propose CrashSim, an index-free algorithm for single-source SimRank computation in static graphs. CrashSim can provide provable approximation guarantees for the computational results in an efficient way. In addition, as the reallife graphs are often represented as temporal graphs, CrashSim enables efficient computation of SimRank in temporal graphs. We formally define two typical SimRank queries in temporal graphs, and then solve them by developing an efficient algorithm based on CrashSim, called CrashSim-T. From the extensive experimental evaluation using five real-life and synthetic datasets, it can be seen that the CrashSim algorithm and CrashSim-T algorithm substantially improve the efficiency of the state-of-the-art SimRank algorithms by about 30%, while achieving the precision of the result set with about 97%.
{"title":"CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs","authors":"Mo Li, F. Choudhury, Renata Borovica-Gajic, Zhiqiong Wang, Junchang Xin, Jianxin Li","doi":"10.1109/ICDE48307.2020.00103","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00103","url":null,"abstract":"SimRank is a significant metric to measure the similarity of nodes in graph data analysis. The problem of SimRank computation has been studied extensively, however there is no existing work that can provide one unified algorithm to support the SimRank computation both on static and temporal graphs. In this work, we first propose CrashSim, an index-free algorithm for single-source SimRank computation in static graphs. CrashSim can provide provable approximation guarantees for the computational results in an efficient way. In addition, as the reallife graphs are often represented as temporal graphs, CrashSim enables efficient computation of SimRank in temporal graphs. We formally define two typical SimRank queries in temporal graphs, and then solve them by developing an efficient algorithm based on CrashSim, called CrashSim-T. From the extensive experimental evaluation using five real-life and synthetic datasets, it can be seen that the CrashSim algorithm and CrashSim-T algorithm substantially improve the efficiency of the state-of-the-art SimRank algorithms by about 30%, while achieving the precision of the result set with about 97%.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"25 1","pages":"1141-1152"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89364839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00029
Ningning Cui, Xiaochun Yang, Bin Wang, Jianxin Li, Guoren Wang
With the boom in cloud computing, data outsourcing in location-based services is proliferating and has attracted increasing interest from research communities and commercial applications. Nevertheless, since the cloud server is probably both untrusted and malicious, concerns of data security and result integrity have become on the rise sharply. However, there exist little work that can commendably assure the data security and result integrity using a unified way. In this paper, we study the problem of secure and verifiable k nearest neighbor query (SVkNN). To support SVkNN, we first propose a novel unified structure, called verifiable and secure index (VSI). Based on this, we devise a series of secure protocols to facilitate query processing and develop a compact verification strategy. Given an SVkNN query, our proposed solution can not merely answer the query efficiently while can guarantee: 1) preserving the privacy of data, query, result and access patterns; 2) authenticating the correctness and completeness of the results without leaking the confidentiality. Finally, the formal security analysis and complexity analysis are theoretically proven and the performance and feasibility of our proposed approaches are empirically evaluated and demonstrated.
{"title":"SVkNN: Efficient Secure and Verifiable k-Nearest Neighbor Query on the Cloud Platform*","authors":"Ningning Cui, Xiaochun Yang, Bin Wang, Jianxin Li, Guoren Wang","doi":"10.1109/ICDE48307.2020.00029","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00029","url":null,"abstract":"With the boom in cloud computing, data outsourcing in location-based services is proliferating and has attracted increasing interest from research communities and commercial applications. Nevertheless, since the cloud server is probably both untrusted and malicious, concerns of data security and result integrity have become on the rise sharply. However, there exist little work that can commendably assure the data security and result integrity using a unified way. In this paper, we study the problem of secure and verifiable k nearest neighbor query (SVkNN). To support SVkNN, we first propose a novel unified structure, called verifiable and secure index (VSI). Based on this, we devise a series of secure protocols to facilitate query processing and develop a compact verification strategy. Given an SVkNN query, our proposed solution can not merely answer the query efficiently while can guarantee: 1) preserving the privacy of data, query, result and access patterns; 2) authenticating the correctness and completeness of the results without leaking the confidentiality. Finally, the formal security analysis and complexity analysis are theoretically proven and the performance and feasibility of our proposed approaches are empirically evaluated and demonstrated.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"7 1","pages":"253-264"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86526204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-01DOI: 10.1109/ICDE48307.2020.00028
X. Yi, Russell Paulet, E. Bertino, Fang-Yu Rao
In this paper we consider the problem where a client wishes to subscribe to some product or service provided by a server, but maintain their anonymity. At the same time, the server must be able to authenticate the client as a genuine user and be able to discontinue (or revoke) the client’s access if the subscription fees are not paid. Current solutions for this problem are typically constructed using some combination of blind signature or zero-knowledge proof techniques, which do not directly support client revocation (that is, revoking a user before expiry of their secret value). In this paper, we present a solution for this problem on the basis of the broadcast encryption scheme, suggested by Boneh et al., by which the server can broadcast a secret to a group of legitimate clients. Our solution allows the registered client to log into the server anonymously and also supports client revocation by the server. Our solution can be used in many applications, such as location-based queries. We formally define a model for our anonymous subscription protocol and prove the security of our solution under this model. In addition, we present experimental results from an implementation of our protocol. These experimental results demonstrate that our protocol is practical.
{"title":"Practical Anonymous Subscription with Revocation Based on Broadcast Encryption","authors":"X. Yi, Russell Paulet, E. Bertino, Fang-Yu Rao","doi":"10.1109/ICDE48307.2020.00028","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00028","url":null,"abstract":"In this paper we consider the problem where a client wishes to subscribe to some product or service provided by a server, but maintain their anonymity. At the same time, the server must be able to authenticate the client as a genuine user and be able to discontinue (or revoke) the client’s access if the subscription fees are not paid. Current solutions for this problem are typically constructed using some combination of blind signature or zero-knowledge proof techniques, which do not directly support client revocation (that is, revoking a user before expiry of their secret value). In this paper, we present a solution for this problem on the basis of the broadcast encryption scheme, suggested by Boneh et al., by which the server can broadcast a secret to a group of legitimate clients. Our solution allows the registered client to log into the server anonymously and also supports client revocation by the server. Our solution can be used in many applications, such as location-based queries. We formally define a model for our anonymous subscription protocol and prove the security of our solution under this model. In addition, we present experimental results from an implementation of our protocol. These experimental results demonstrate that our protocol is practical.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"3 1","pages":"241-252"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87917424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}