首页 > 最新文献

2020 IEEE 36th International Conference on Data Engineering (ICDE)最新文献

英文 中文
On the Integration of Machine Learning and Array Databases 论机器学习与阵列数据库的集成
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00170
Sebastián Villarroya, P. Baumann
Machine Learning is increasingly being applied to many different application domains. From cancer detection to weather forecast, a large number of different applications leverage machine learning algorithms to get faster and more accurate results over huge datasets. Although many of these datasets are mainly composed of array data, a vast majority of machine learning applications do not use array databases. This tutorial focuses on the integration of machine learning algorithms and array databases. By implementing machine learning algorithms in array databases users can boost the native efficient array data processing with machine learning methods to perform accurate and fast array data analytics.
机器学习越来越多地应用于许多不同的应用领域。从癌症检测到天气预报,大量不同的应用程序利用机器学习算法在庞大的数据集上获得更快、更准确的结果。尽管其中许多数据集主要由数组数据组成,但绝大多数机器学习应用程序并不使用数组数据库。本教程的重点是机器学习算法和数组数据库的集成。通过在阵列数据库中实现机器学习算法,用户可以利用机器学习方法提高阵列数据处理的效率,从而实现准确、快速的阵列数据分析。
{"title":"On the Integration of Machine Learning and Array Databases","authors":"Sebastián Villarroya, P. Baumann","doi":"10.1109/ICDE48307.2020.00170","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00170","url":null,"abstract":"Machine Learning is increasingly being applied to many different application domains. From cancer detection to weather forecast, a large number of different applications leverage machine learning algorithms to get faster and more accurate results over huge datasets. Although many of these datasets are mainly composed of array data, a vast majority of machine learning applications do not use array databases. This tutorial focuses on the integration of machine learning algorithms and array databases. By implementing machine learning algorithms in array databases users can boost the native efficient array data processing with machine learning methods to perform accurate and fast array data analytics.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"24 1","pages":"1786-1789"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74754436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Novel Approach to Learning Consensus and Complementary Information for Multi-View Data Clustering 一种新的多视图数据聚类共识和互补信息学习方法
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00080
Khanh Luong, R. Nayak
Effective methods are required to be developed that can deal with the multi-faceted nature of the multi-view data. We design a factorization-based loss function-based method to simultaneously learn two components encoding the consensus and complementary information present in multi-view data by using the Coupled Matrix Factorization (CMF) and Non-negative Matrix Factorization (NMF). We propose a novel optimal manifold for multi-view data which is the most consensed manifold embedded in the high-dimensional multi-view data. A new complementary enhancing term is added in the loss function to enhance the complementary information inherent in each view. An extensive experiment with diverse datasets, benchmarking the state-of-the-art multi-view clustering methods, has demonstrated the effectiveness of the proposed method in obtaining accurate clustering solution.
需要开发有效的方法来处理多视图数据的多面性。利用耦合矩阵分解(CMF)和非负矩阵分解(NMF),设计了一种基于损失函数的分解方法,同时学习多视图数据中存在的一致性和互补信息的两个分量。我们提出了一种新的多视图数据的最优流形,它是嵌入在高维多视图数据中的最一致的流形。在损失函数中增加了一个新的互补增强项,以增强每个视图固有的互补信息。在不同的数据集上进行了广泛的实验,对最先进的多视图聚类方法进行了基准测试,证明了该方法在获得准确聚类解决方案方面的有效性。
{"title":"A Novel Approach to Learning Consensus and Complementary Information for Multi-View Data Clustering","authors":"Khanh Luong, R. Nayak","doi":"10.1109/ICDE48307.2020.00080","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00080","url":null,"abstract":"Effective methods are required to be developed that can deal with the multi-faceted nature of the multi-view data. We design a factorization-based loss function-based method to simultaneously learn two components encoding the consensus and complementary information present in multi-view data by using the Coupled Matrix Factorization (CMF) and Non-negative Matrix Factorization (NMF). We propose a novel optimal manifold for multi-view data which is the most consensed manifold embedded in the high-dimensional multi-view data. A new complementary enhancing term is added in the loss function to enhance the complementary information inherent in each view. An extensive experiment with diverse datasets, benchmarking the state-of-the-art multi-view clustering methods, has demonstrated the effectiveness of the proposed method in obtaining accurate clustering solution.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"6 1","pages":"865-876"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73425873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Message from the ICDE 2020 Chairs ICDE 2020主席致辞
Pub Date : 2020-04-01 DOI: 10.1109/icde48307.2020.00005
{"title":"Message from the ICDE 2020 Chairs","authors":"","doi":"10.1109/icde48307.2020.00005","DOIUrl":"https://doi.org/10.1109/icde48307.2020.00005","url":null,"abstract":"","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73778172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-Level Data Compression using Machine Learning in Time Series Database 时间序列数据库中使用机器学习的两级数据压缩
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00119
Xinyang Yu, Yanqing Peng, Feifei Li, Sheng Wang, Xiaowei Shen, Huijun Mai, Yue Xie
The explosion of time series advances the development of time series databases. To reduce storage overhead in these systems, data compression is widely adopted. Most existing compression algorithms utilize the overall characteristics of the entire time series to achieve high compression ratio, but ignore local contexts around individual points. In this way, they are effective for certain data patterns, and may suffer inherent pattern changes in real-world time series. It is therefore strongly desired to have a compression method that can always achieve high compression ratio in the existence of pattern diversity.In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity. Based on this model, we design and implement AMMMO framework, where a set of control parameters is defined to distill and categorize data patterns. At the top level, we evaluate each sub-sequence to fill in these parameters, generating a set of compression scheme candidates (i.e., major mode selection). At the bottom level, we choose the best scheme from these candidates for each data point respectively (i.e., sub-mode selection). To effectively handle diverse data patterns, we introduce a reinforcement learning based approach to learn parameter values automatically. Our experimental evaluation shows that our approach improves compression ratio by up to 120% (with an average of 50%), compared to other time-series compression methods.
时间序列的爆炸式增长推动了时间序列数据库的发展。为了减少这些系统的存储开销,数据压缩被广泛采用。现有的压缩算法大多利用整个时间序列的整体特征来实现高压缩比,而忽略了单个点周围的局部上下文。通过这种方式,它们对某些数据模式有效,并且可能在实际时间序列中遭受固有的模式变化。因此,人们强烈希望有一种在模式多样性存在的情况下始终能够获得高压缩比的压缩方法。在本文中,我们提出了一个两级压缩模型,该模型为每个单独的点选择合适的压缩方案,从而可以在细粒度上捕获不同的模式。在此基础上,设计并实现了AMMMO框架,定义了一组控制参数对数据模式进行提取和分类。在顶层,我们评估每个子序列以填充这些参数,生成一组压缩方案候选(即主要模式选择)。在底层,我们分别从这些候选方案中为每个数据点选择最佳方案(即子模式选择)。为了有效地处理不同的数据模式,我们引入了一种基于强化学习的方法来自动学习参数值。我们的实验评估表明,与其他时间序列压缩方法相比,我们的方法将压缩比提高了120%(平均为50%)。
{"title":"Two-Level Data Compression using Machine Learning in Time Series Database","authors":"Xinyang Yu, Yanqing Peng, Feifei Li, Sheng Wang, Xiaowei Shen, Huijun Mai, Yue Xie","doi":"10.1109/ICDE48307.2020.00119","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00119","url":null,"abstract":"The explosion of time series advances the development of time series databases. To reduce storage overhead in these systems, data compression is widely adopted. Most existing compression algorithms utilize the overall characteristics of the entire time series to achieve high compression ratio, but ignore local contexts around individual points. In this way, they are effective for certain data patterns, and may suffer inherent pattern changes in real-world time series. It is therefore strongly desired to have a compression method that can always achieve high compression ratio in the existence of pattern diversity.In this paper, we propose a two-level compression model that selects a proper compression scheme for each individual point, so that diverse patterns can be captured at a fine granularity. Based on this model, we design and implement AMMMO framework, where a set of control parameters is defined to distill and categorize data patterns. At the top level, we evaluate each sub-sequence to fill in these parameters, generating a set of compression scheme candidates (i.e., major mode selection). At the bottom level, we choose the best scheme from these candidates for each data point respectively (i.e., sub-mode selection). To effectively handle diverse data patterns, we introduce a reinforcement learning based approach to learn parameter values automatically. Our experimental evaluation shows that our approach improves compression ratio by up to 120% (with an average of 50%), compared to other time-series compression methods.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"8 1","pages":"1333-1344"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83698489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
SFour: A Protocol for Cryptographically Secure Record Linkage at Scale 一种大规模加密安全记录链接协议
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00031
Basit Khurram, F. Kerschbaum
The prevalence of various (and increasingly large) datasets presents the challenging problem of discovering common entities dispersed across disparate datasets. Solutions to the private record linkage problem (PRL) aim to enable such explorations of datasets in a secure manner. A two-party PRL protocol allows two parties to determine for which entities they each possess a record (either an exact matching record or a fuzzy matching record) in their respective datasets — without revealing to one another information about any entities for which they do not both possess records. Although several solutions have been proposed to solve the PRL problem, no current solution offers a fully cryptographic security guarantee while maintaining both high accuracy of output and subquadratic runtime efficiency. To this end, we propose the first known efficient PRL protocol that runs in subquadratic time, provides high accuracy, and guarantees cryptographic security in the semi-honest security model.
各种(越来越大的)数据集的流行提出了一个具有挑战性的问题,即发现分散在不同数据集中的公共实体。私有记录链接问题(PRL)的解决方案旨在以安全的方式实现对数据集的这种探索。双方PRL协议允许双方在各自的数据集中确定各自拥有哪些实体的记录(精确匹配记录或模糊匹配记录),而不会向另一方透露任何他们都没有记录的实体的信息。虽然已经提出了几种解决PRL问题的方案,但目前没有一种方案能够在保持高输出精度和次二次运行效率的同时提供完全的加密安全保证。为此,我们提出了已知的第一个高效的PRL协议,该协议在半诚实安全模型中运行在次二次时间内,提供了高准确性,并保证了加密安全性。
{"title":"SFour: A Protocol for Cryptographically Secure Record Linkage at Scale","authors":"Basit Khurram, F. Kerschbaum","doi":"10.1109/ICDE48307.2020.00031","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00031","url":null,"abstract":"The prevalence of various (and increasingly large) datasets presents the challenging problem of discovering common entities dispersed across disparate datasets. Solutions to the private record linkage problem (PRL) aim to enable such explorations of datasets in a secure manner. A two-party PRL protocol allows two parties to determine for which entities they each possess a record (either an exact matching record or a fuzzy matching record) in their respective datasets — without revealing to one another information about any entities for which they do not both possess records. Although several solutions have been proposed to solve the PRL problem, no current solution offers a fully cryptographic security guarantee while maintaining both high accuracy of output and subquadratic runtime efficiency. To this end, we propose the first known efficient PRL protocol that runs in subquadratic time, provides high accuracy, and guarantees cryptographic security in the semi-honest security model.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"83 1","pages":"277-288"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83065431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Optimizing Knowledge Graphs through Voting-based User Feedback 通过基于投票的用户反馈优化知识图谱
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00043
Ruida Yang, Xin Lin, Jianliang Xu, Yan Yang, Liang He
Knowledge graphs have been used in a wide range of applications to support search, recommendation, and question answering (Q&A). For example, in Q&A systems, given a new question, we may use a knowledge graph to automatically identify the most suitable answers based on similarity evaluation. However, such systems may suffer from two major limitations. First, the knowledge graph constructed based on source data may contain errors. Second, the knowledge graph may become out of date and cannot quickly adapt to new knowledge. To address these issues, in this paper, we propose an interactive framework that refines and optimizes knowledge graphs through user votes. We develop an efficient similarity evaluation notion, called extended inverse P-distance, based on which the graph optimization problem can be formulated as a signomial geometric programming problem. We then propose a basic single-vote solution and a more advanced multi-vote solution for graph optimization. We also propose a split-and-merge optimization strategy to scale up the multi-vote solution. Extensive experiments based on real-life and synthetic graphs demonstrate the effectiveness and efficiency of our proposed framework.
知识图谱已被广泛应用于支持搜索、推荐和问答(Q&A)的应用中。例如,在问答系统中,给定一个新问题,我们可以使用知识图来基于相似性评估自动识别最合适的答案。然而,这种系统可能受到两个主要限制。首先,基于源数据构建的知识图可能存在错误。其次,知识图谱可能会过时,无法快速适应新知识。为了解决这些问题,在本文中,我们提出了一个交互式框架,通过用户投票来细化和优化知识图谱。我们提出了一种有效的相似性评价概念,称为扩展逆p距离,在此基础上,图优化问题可以表述为一个符号几何规划问题。然后,我们提出了一个基本的单投票解决方案和一个更高级的多投票解决方案的图优化。我们还提出了一种分裂合并优化策略来扩大多投票解决方案的规模。基于现实生活和合成图的大量实验证明了我们提出的框架的有效性和效率。
{"title":"Optimizing Knowledge Graphs through Voting-based User Feedback","authors":"Ruida Yang, Xin Lin, Jianliang Xu, Yan Yang, Liang He","doi":"10.1109/ICDE48307.2020.00043","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00043","url":null,"abstract":"Knowledge graphs have been used in a wide range of applications to support search, recommendation, and question answering (Q&A). For example, in Q&A systems, given a new question, we may use a knowledge graph to automatically identify the most suitable answers based on similarity evaluation. However, such systems may suffer from two major limitations. First, the knowledge graph constructed based on source data may contain errors. Second, the knowledge graph may become out of date and cannot quickly adapt to new knowledge. To address these issues, in this paper, we propose an interactive framework that refines and optimizes knowledge graphs through user votes. We develop an efficient similarity evaluation notion, called extended inverse P-distance, based on which the graph optimization problem can be formulated as a signomial geometric programming problem. We then propose a basic single-vote solution and a more advanced multi-vote solution for graph optimization. We also propose a split-and-merge optimization strategy to scale up the multi-vote solution. Extensive experiments based on real-life and synthetic graphs demonstrate the effectiveness and efficiency of our proposed framework.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"421-432"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77809538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Sentinel: A Declarative Production-Scale Data Validation Platform 数据哨兵:声明式生产规模数据验证平台
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00140
A. Swami, Sriram Vasudevan, Joojay Huyn
Many organizations process big data for important business operations and decisions. Hence, data quality greatly affects their success. Data quality problems continue to be widespread, costing US businesses an estimated $600 billion annually. To date, addressing data quality in production environments still poses many challenges: easily defining properties of high-quality data; validating production-scale data in a timely manner; debugging poor quality data; designing data quality solutions to be easy to use, understand, and operate; and designing data quality solutions to easily integrate with other systems. Current data validation solutions do not comprehensively address these challenges. To address data quality in production environments at LinkedIn, we developed Data Sentinel, a declarative production-scale data validation platform. In a simple and well-structured configuration, users declaratively specify the desired data checks. Then, Data Sentinel performs these data checks and writes the results to an easily understandable report. Furthermore, Data Sentinel provides well-defined schemas for the configuration and report. This makes it easy for other systems to interface or integrate with Data Sentinel. To make Data Sentinel even easier to use, understand, and operate in production environments, we provide Data Sentinel Service (DSS), a complementary system to help specify data checks, schedule, deploy, and tune data validation jobs, and understand data checking results. The contributions of this paper include the following: 1) Data Sentinel, a declarative production-scale data validation platform successfully deployed at LinkedIn 2) A generic design to build and deploy similar systems for production environments 3) Experiences and lessons learned that can benefit practitioners with similar objectives.
许多组织处理大数据来进行重要的业务操作和决策。因此,数据质量极大地影响了它们的成功。数据质量问题仍然普遍存在,估计每年给美国企业造成6000亿美元的损失。迄今为止,在生产环境中解决数据质量问题仍然面临许多挑战:容易定义高质量数据的属性;及时验证生产规模数据;调试质量差的数据;设计易于使用、理解和操作的数据质量解决方案;并设计数据质量解决方案,以便与其他系统轻松集成。当前的数据验证解决方案不能全面解决这些挑战。为了解决LinkedIn生产环境中的数据质量问题,我们开发了data Sentinel,这是一个声明式的生产规模数据验证平台。在简单且结构良好的配置中,用户声明式地指定所需的数据检查。然后,Data Sentinel执行这些数据检查,并将结果写入一个易于理解的报告。此外,Data Sentinel为配置和报告提供了良好定义的模式。这使得其他系统很容易与Data Sentinel进行接口或集成。为了使Data Sentinel在生产环境中更容易使用、理解和操作,我们提供了Data Sentinel Service (DSS),这是一个辅助系统,可帮助指定数据检查、调度、部署和调优数据验证作业,并理解数据检查结果。本文的贡献包括以下内容:1)Data Sentinel,一个成功部署在LinkedIn上的声明式生产规模数据验证平台;2)为生产环境构建和部署类似系统的通用设计;3)经验和教训,可以使具有类似目标的从业者受益。
{"title":"Data Sentinel: A Declarative Production-Scale Data Validation Platform","authors":"A. Swami, Sriram Vasudevan, Joojay Huyn","doi":"10.1109/ICDE48307.2020.00140","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00140","url":null,"abstract":"Many organizations process big data for important business operations and decisions. Hence, data quality greatly affects their success. Data quality problems continue to be widespread, costing US businesses an estimated $600 billion annually. To date, addressing data quality in production environments still poses many challenges: easily defining properties of high-quality data; validating production-scale data in a timely manner; debugging poor quality data; designing data quality solutions to be easy to use, understand, and operate; and designing data quality solutions to easily integrate with other systems. Current data validation solutions do not comprehensively address these challenges. To address data quality in production environments at LinkedIn, we developed Data Sentinel, a declarative production-scale data validation platform. In a simple and well-structured configuration, users declaratively specify the desired data checks. Then, Data Sentinel performs these data checks and writes the results to an easily understandable report. Furthermore, Data Sentinel provides well-defined schemas for the configuration and report. This makes it easy for other systems to interface or integrate with Data Sentinel. To make Data Sentinel even easier to use, understand, and operate in production environments, we provide Data Sentinel Service (DSS), a complementary system to help specify data checks, schedule, deploy, and tune data validation jobs, and understand data checking results. The contributions of this paper include the following: 1) Data Sentinel, a declarative production-scale data validation platform successfully deployed at LinkedIn 2) A generic design to build and deploy similar systems for production environments 3) Experiences and lessons learned that can benefit practitioners with similar objectives.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"13 1","pages":"1579-1590"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83360330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs CrashSim:一个在静态和时间图上计算simmrank的有效算法
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00103
Mo Li, F. Choudhury, Renata Borovica-Gajic, Zhiqiong Wang, Junchang Xin, Jianxin Li
SimRank is a significant metric to measure the similarity of nodes in graph data analysis. The problem of SimRank computation has been studied extensively, however there is no existing work that can provide one unified algorithm to support the SimRank computation both on static and temporal graphs. In this work, we first propose CrashSim, an index-free algorithm for single-source SimRank computation in static graphs. CrashSim can provide provable approximation guarantees for the computational results in an efficient way. In addition, as the reallife graphs are often represented as temporal graphs, CrashSim enables efficient computation of SimRank in temporal graphs. We formally define two typical SimRank queries in temporal graphs, and then solve them by developing an efficient algorithm based on CrashSim, called CrashSim-T. From the extensive experimental evaluation using five real-life and synthetic datasets, it can be seen that the CrashSim algorithm and CrashSim-T algorithm substantially improve the efficiency of the state-of-the-art SimRank algorithms by about 30%, while achieving the precision of the result set with about 97%.
simmrank是图数据分析中衡量节点相似性的重要指标。simmrank计算问题已经得到了广泛的研究,但目前还没有一种统一的算法来支持静态图和时态图上的simmrank计算。在这项工作中,我们首先提出了CrashSim,一种无索引的算法,用于静态图中的单源simmrank计算。CrashSim可以有效地为计算结果提供可证明的近似保证。此外,由于现实生活中的图形通常表示为时间图,因此CrashSim可以在时间图中高效地计算simmrank。我们在时间图中正式定义了两个典型的SimRank查询,然后通过开发一种基于CrashSim的高效算法(称为CrashSim- t)来解决它们。通过使用5个真实数据集和合成数据集进行广泛的实验评估,可以看出,CrashSim算法和CrashSim- t算法将目前最先进的simmrank算法的效率大幅提高了约30%,同时实现了约97%的结果集精度。
{"title":"CrashSim: An Efficient Algorithm for Computing SimRank over Static and Temporal Graphs","authors":"Mo Li, F. Choudhury, Renata Borovica-Gajic, Zhiqiong Wang, Junchang Xin, Jianxin Li","doi":"10.1109/ICDE48307.2020.00103","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00103","url":null,"abstract":"SimRank is a significant metric to measure the similarity of nodes in graph data analysis. The problem of SimRank computation has been studied extensively, however there is no existing work that can provide one unified algorithm to support the SimRank computation both on static and temporal graphs. In this work, we first propose CrashSim, an index-free algorithm for single-source SimRank computation in static graphs. CrashSim can provide provable approximation guarantees for the computational results in an efficient way. In addition, as the reallife graphs are often represented as temporal graphs, CrashSim enables efficient computation of SimRank in temporal graphs. We formally define two typical SimRank queries in temporal graphs, and then solve them by developing an efficient algorithm based on CrashSim, called CrashSim-T. From the extensive experimental evaluation using five real-life and synthetic datasets, it can be seen that the CrashSim algorithm and CrashSim-T algorithm substantially improve the efficiency of the state-of-the-art SimRank algorithms by about 30%, while achieving the precision of the result set with about 97%.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"25 1","pages":"1141-1152"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89364839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
SVkNN: Efficient Secure and Verifiable k-Nearest Neighbor Query on the Cloud Platform* 云平台上高效、安全、可验证的k近邻查询方法*
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00029
Ningning Cui, Xiaochun Yang, Bin Wang, Jianxin Li, Guoren Wang
With the boom in cloud computing, data outsourcing in location-based services is proliferating and has attracted increasing interest from research communities and commercial applications. Nevertheless, since the cloud server is probably both untrusted and malicious, concerns of data security and result integrity have become on the rise sharply. However, there exist little work that can commendably assure the data security and result integrity using a unified way. In this paper, we study the problem of secure and verifiable k nearest neighbor query (SVkNN). To support SVkNN, we first propose a novel unified structure, called verifiable and secure index (VSI). Based on this, we devise a series of secure protocols to facilitate query processing and develop a compact verification strategy. Given an SVkNN query, our proposed solution can not merely answer the query efficiently while can guarantee: 1) preserving the privacy of data, query, result and access patterns; 2) authenticating the correctness and completeness of the results without leaking the confidentiality. Finally, the formal security analysis and complexity analysis are theoretically proven and the performance and feasibility of our proposed approaches are empirically evaluated and demonstrated.
随着云计算的蓬勃发展,基于位置的服务的数据外包正在激增,并吸引了越来越多的研究团体和商业应用的兴趣。然而,由于云服务器可能既不可信又带有恶意,因此对数据安全性和结果完整性的担忧急剧上升。然而,目前很少有工作能够用统一的方法很好地保证数据的安全性和结果的完整性。本文研究了安全可验证的k近邻查询(SVkNN)问题。为了支持SVkNN,我们首先提出了一种新的统一结构,称为可验证和安全索引(VSI)。在此基础上,我们设计了一系列安全协议来简化查询处理,并开发了紧凑的验证策略。对于一个SVkNN查询,我们提出的解决方案不仅能够高效地回答查询,而且能够保证:1)保持数据、查询、结果和访问模式的隐私性;2)在不泄露机密性的前提下,验证结果的正确性和完整性。最后,从理论上证明了形式安全分析和复杂性分析,并对我们提出的方法的性能和可行性进行了实证评估和论证。
{"title":"SVkNN: Efficient Secure and Verifiable k-Nearest Neighbor Query on the Cloud Platform*","authors":"Ningning Cui, Xiaochun Yang, Bin Wang, Jianxin Li, Guoren Wang","doi":"10.1109/ICDE48307.2020.00029","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00029","url":null,"abstract":"With the boom in cloud computing, data outsourcing in location-based services is proliferating and has attracted increasing interest from research communities and commercial applications. Nevertheless, since the cloud server is probably both untrusted and malicious, concerns of data security and result integrity have become on the rise sharply. However, there exist little work that can commendably assure the data security and result integrity using a unified way. In this paper, we study the problem of secure and verifiable k nearest neighbor query (SVkNN). To support SVkNN, we first propose a novel unified structure, called verifiable and secure index (VSI). Based on this, we devise a series of secure protocols to facilitate query processing and develop a compact verification strategy. Given an SVkNN query, our proposed solution can not merely answer the query efficiently while can guarantee: 1) preserving the privacy of data, query, result and access patterns; 2) authenticating the correctness and completeness of the results without leaking the confidentiality. Finally, the formal security analysis and complexity analysis are theoretically proven and the performance and feasibility of our proposed approaches are empirically evaluated and demonstrated.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"7 1","pages":"253-264"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86526204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Practical Anonymous Subscription with Revocation Based on Broadcast Encryption 基于广播加密的实用可撤销匿名订阅
Pub Date : 2020-04-01 DOI: 10.1109/ICDE48307.2020.00028
X. Yi, Russell Paulet, E. Bertino, Fang-Yu Rao
In this paper we consider the problem where a client wishes to subscribe to some product or service provided by a server, but maintain their anonymity. At the same time, the server must be able to authenticate the client as a genuine user and be able to discontinue (or revoke) the client’s access if the subscription fees are not paid. Current solutions for this problem are typically constructed using some combination of blind signature or zero-knowledge proof techniques, which do not directly support client revocation (that is, revoking a user before expiry of their secret value). In this paper, we present a solution for this problem on the basis of the broadcast encryption scheme, suggested by Boneh et al., by which the server can broadcast a secret to a group of legitimate clients. Our solution allows the registered client to log into the server anonymously and also supports client revocation by the server. Our solution can be used in many applications, such as location-based queries. We formally define a model for our anonymous subscription protocol and prove the security of our solution under this model. In addition, we present experimental results from an implementation of our protocol. These experimental results demonstrate that our protocol is practical.
在本文中,我们考虑了客户端希望订阅服务器提供的某些产品或服务,但保持其匿名性的问题。同时,服务器必须能够将客户端验证为真正的用户,并且能够在未支付订阅费用的情况下中断(或撤销)客户端的访问。该问题的当前解决方案通常使用盲签名或零知识证明技术的某种组合来构建,这些技术不直接支持客户端撤销(即在其秘密值到期之前撤销用户)。在本文中,我们提出了一种基于广播加密方案的解决方案,该方案由Boneh等人提出,通过该方案,服务器可以向一组合法客户端广播秘密。我们的解决方案允许注册的客户端匿名登录到服务器,并且还支持服务器撤销客户端。我们的解决方案可用于许多应用程序,例如基于位置的查询。我们正式定义了匿名订阅协议的模型,并在此模型下证明了我们的解决方案的安全性。此外,我们给出了我们的协议实现的实验结果。实验结果表明,该方案是切实可行的。
{"title":"Practical Anonymous Subscription with Revocation Based on Broadcast Encryption","authors":"X. Yi, Russell Paulet, E. Bertino, Fang-Yu Rao","doi":"10.1109/ICDE48307.2020.00028","DOIUrl":"https://doi.org/10.1109/ICDE48307.2020.00028","url":null,"abstract":"In this paper we consider the problem where a client wishes to subscribe to some product or service provided by a server, but maintain their anonymity. At the same time, the server must be able to authenticate the client as a genuine user and be able to discontinue (or revoke) the client’s access if the subscription fees are not paid. Current solutions for this problem are typically constructed using some combination of blind signature or zero-knowledge proof techniques, which do not directly support client revocation (that is, revoking a user before expiry of their secret value). In this paper, we present a solution for this problem on the basis of the broadcast encryption scheme, suggested by Boneh et al., by which the server can broadcast a secret to a group of legitimate clients. Our solution allows the registered client to log into the server anonymously and also supports client revocation by the server. Our solution can be used in many applications, such as location-based queries. We formally define a model for our anonymous subscription protocol and prove the security of our solution under this model. In addition, we present experimental results from an implementation of our protocol. These experimental results demonstrate that our protocol is practical.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"3 1","pages":"241-252"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87917424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2020 IEEE 36th International Conference on Data Engineering (ICDE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1