Query plan caching eliminates the need for repeated query optimization, hence, it has strong practical implications for relational database management systems (RDBMSs). Unfortunately, existing approaches consider only the query plan generated at the expected values of parameters that characterize the query, data and the current state of the system, while these parameters may take different values during the lifetime of a cached plan. A better alternative is to harvest the optimizer's plan choice for different parameter values, populate the cache with promising query plans, and select a cached plan based upon current parameter values. To address this challenge, we propose a parametric plan caching (PPC) framework that uses an online plan space clustering algorithm. The clustering algorithm is density-based, and it exploits locality-sensitive hashing as a pre-processing step so that clusters in the plan spaces can be efficiently stored in database histograms and queried in constant time. We experimentally validate that our approach is precise, efficient in space-and-time and adaptive, requiring no eager exploration of the plan spaces of the optimizer.
{"title":"Parametric Plan Caching Using Density-Based Clustering","authors":"Günes Aluç, David DeHaan, Ivan T. Bowman","doi":"10.1109/ICDE.2012.57","DOIUrl":"https://doi.org/10.1109/ICDE.2012.57","url":null,"abstract":"Query plan caching eliminates the need for repeated query optimization, hence, it has strong practical implications for relational database management systems (RDBMSs). Unfortunately, existing approaches consider only the query plan generated at the expected values of parameters that characterize the query, data and the current state of the system, while these parameters may take different values during the lifetime of a cached plan. A better alternative is to harvest the optimizer's plan choice for different parameter values, populate the cache with promising query plans, and select a cached plan based upon current parameter values. To address this challenge, we propose a parametric plan caching (PPC) framework that uses an online plan space clustering algorithm. The clustering algorithm is density-based, and it exploits locality-sensitive hashing as a pre-processing step so that clusters in the plan spaces can be efficiently stored in database histograms and queried in constant time. We experimentally validate that our approach is precise, efficient in space-and-time and adaptive, requiring no eager exploration of the plan spaces of the optimizer.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130731254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sequence analysis is very important in our daily life. Typically, each sequence is associated with an ordered list of elements. For example, in a movie rental application, a customer's movie rental record containing an ordered list of movies is a sequence example. Most studies about sequence analysis focus on subsequence matching which finds all sequences stored in the database such that a given query sequence is a subsequence of each of these sequences. In many applications, elements are associated with properties or attributes. For example, each movie is associated with some attributes like "Director" and "Actors". Unfortunately, to the best of our knowledge, all existing studies about sequence analysis do not consider the attributes of elements. In this paper, we propose two problems. The first problem is: given a query sequence and a set of sequences, considering the attributes of elements, we want to find all sequences which are matched by this query sequence. This problem is called attribute-based subsequence matching (ASM). All existing applications for the traditional subsequence matching problem can also be applied to our new problem provided that we are given the attributes of elements. We propose an efficient algorithm for problem ASM. The key idea to the efficiency of this algorithm is to compress each whole sequence with potentially many associated attributes into just a triplet of numbers. By dealing with these very compressed representations, we greatly speed up the attribute-based subsequence matching. The second problem is to find all frequent attribute-based subsequence. We also adapt an existing efficient algorithm for this second problem to show we can use the algorithm developed for the first problem. Empirical studies show that our algorithms are scalable in large datasets. In particular, our algorithms run at least an order of magnitude faster than a straightforward method in most cases. This work can stimulate a number of existing data mining problems which are fundamentally based on subsequence matching such as sequence classification, frequent sequence mining, motif detection and sequence matching in bioinformatics.
{"title":"Attribute-Based Subsequence Matching and Mining","authors":"Yu Peng, R. C. Wong, Liangliang Ye, Philip S. Yu","doi":"10.1109/ICDE.2012.81","DOIUrl":"https://doi.org/10.1109/ICDE.2012.81","url":null,"abstract":"Sequence analysis is very important in our daily life. Typically, each sequence is associated with an ordered list of elements. For example, in a movie rental application, a customer's movie rental record containing an ordered list of movies is a sequence example. Most studies about sequence analysis focus on subsequence matching which finds all sequences stored in the database such that a given query sequence is a subsequence of each of these sequences. In many applications, elements are associated with properties or attributes. For example, each movie is associated with some attributes like \"Director\" and \"Actors\". Unfortunately, to the best of our knowledge, all existing studies about sequence analysis do not consider the attributes of elements. In this paper, we propose two problems. The first problem is: given a query sequence and a set of sequences, considering the attributes of elements, we want to find all sequences which are matched by this query sequence. This problem is called attribute-based subsequence matching (ASM). All existing applications for the traditional subsequence matching problem can also be applied to our new problem provided that we are given the attributes of elements. We propose an efficient algorithm for problem ASM. The key idea to the efficiency of this algorithm is to compress each whole sequence with potentially many associated attributes into just a triplet of numbers. By dealing with these very compressed representations, we greatly speed up the attribute-based subsequence matching. The second problem is to find all frequent attribute-based subsequence. We also adapt an existing efficient algorithm for this second problem to show we can use the algorithm developed for the first problem. Empirical studies show that our algorithms are scalable in large datasets. In particular, our algorithms run at least an order of magnitude faster than a straightforward method in most cases. This work can stimulate a number of existing data mining problems which are fundamentally based on subsequence matching such as sequence classification, frequent sequence mining, motif detection and sequence matching in bioinformatics.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131725192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
At the beginning of Web 2.0 era, Online Social Networks (OSNs) appeared as just another phenomenon among wikis, blogs, video sharing, and so on. However, they soon became one of the biggest revolution of the Internet era. Statistics confirm the continuing rise in the importance of social networking sites in terms of number of users (e.g., Facebook reaches 750 millions users, Twitter 200 millions, LinkedIn 100 millions), time spent in social networking sites, and amount of data flowing (e.g., Facebook users interact with about 900 million piece of data in terms of pages, groups, events and community pages). This successful trend lets OSNs to be one of the most promising paradigms for information sharing on the Web.
在Web 2.0时代之初,在线社交网络(Online Social Networks, OSNs)只是wiki、博客、视频共享等中的另一种现象。然而,它们很快成为互联网时代最大的革命之一。统计数据证实了社交网站在用户数量(例如,Facebook达到7.5亿用户,Twitter达到2亿用户,LinkedIn达到1亿用户)、在社交网站上花费的时间和数据流量(例如,Facebook用户在页面、群组、事件和社区页面方面与大约9亿条数据进行交互)方面的重要性持续上升。这种成功的趋势使osn成为Web上信息共享最有前途的范例之一。
{"title":"Trust and Share: Trusted Information Sharing in Online Social Networks","authors":"B. Carminati, E. Ferrari, Jacopo Girardi","doi":"10.1109/ICDE.2012.127","DOIUrl":"https://doi.org/10.1109/ICDE.2012.127","url":null,"abstract":"At the beginning of Web 2.0 era, Online Social Networks (OSNs) appeared as just another phenomenon among wikis, blogs, video sharing, and so on. However, they soon became one of the biggest revolution of the Internet era. Statistics confirm the continuing rise in the importance of social networking sites in terms of number of users (e.g., Facebook reaches 750 millions users, Twitter 200 millions, LinkedIn 100 millions), time spent in social networking sites, and amount of data flowing (e.g., Facebook users interact with about 900 million piece of data in terms of pages, groups, events and community pages). This successful trend lets OSNs to be one of the most promising paradigms for information sharing on the Web.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128265206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, lots of micro-blog message sharing applications have emerged on the web. Users can publish short messages freely and get notified by the subscriptions instantly. Prominent examples include Twitter, Facebook's statuses, and Sina Weibo in China. The Micro-blog platform becomes a useful service for real time information creation and propagation. However, these messages' short length and dynamic characters have posed great challenges for effective content understanding. Additionally, the noise and fragments make it difficult to discover the temporal propagation trail to explore development of micro-blog messages. In this paper, we propose a provenance model to capture connections between micro-blog messages. Provenance refers to data origin identification and transformation logging, demonstrating of great value in recent database and workflow systems. To cope with the real time micro-message deluge, we utilize a novel message grouping approach to encode and maintain the provenance information. Furthermore, we adopt a summary index and several adaptive pruning strategies to implement efficient provenance updating. Based on the index, our provenance solution can support rich query retrieval and intuitive message tracking for effective message organization. Experiments conducted on a real dataset verify the effectiveness and efficiency of our approach. Provenance refers to data origin identification and transformation monitoring, which has been demonstrated of great value in database and workflow systems. In this paper, we propose a provenance model in micro-blog platforms, and design an indexing scheme to support provenance-based message discovery and maintenance, which can capture the interactions of messages for effective message organization. To cope with the real time micro-message tornadoes, we introduce a novel virtual annotation grouping approach to encode and maintain the provenance information. Furthermore, we design a summary index and adaptive pruning strategies to facilitate efficient message update. Based on this provenance index, our approach can support query and message tracking in micro-blog systems. Experiments conducted on real datasets verify the effectiveness and efficiency of our approach.
{"title":"Provenance-based Indexing Support in Micro-blog Platforms","authors":"Junjie Yao, B. Cui, Zijun Xue, Qi Liu","doi":"10.1109/ICDE.2012.36","DOIUrl":"https://doi.org/10.1109/ICDE.2012.36","url":null,"abstract":"Recently, lots of micro-blog message sharing applications have emerged on the web. Users can publish short messages freely and get notified by the subscriptions instantly. Prominent examples include Twitter, Facebook's statuses, and Sina Weibo in China. The Micro-blog platform becomes a useful service for real time information creation and propagation. However, these messages' short length and dynamic characters have posed great challenges for effective content understanding. Additionally, the noise and fragments make it difficult to discover the temporal propagation trail to explore development of micro-blog messages. In this paper, we propose a provenance model to capture connections between micro-blog messages. Provenance refers to data origin identification and transformation logging, demonstrating of great value in recent database and workflow systems. To cope with the real time micro-message deluge, we utilize a novel message grouping approach to encode and maintain the provenance information. Furthermore, we adopt a summary index and several adaptive pruning strategies to implement efficient provenance updating. Based on the index, our provenance solution can support rich query retrieval and intuitive message tracking for effective message organization. Experiments conducted on a real dataset verify the effectiveness and efficiency of our approach. Provenance refers to data origin identification and transformation monitoring, which has been demonstrated of great value in database and workflow systems. In this paper, we propose a provenance model in micro-blog platforms, and design an indexing scheme to support provenance-based message discovery and maintenance, which can capture the interactions of messages for effective message organization. To cope with the real time micro-message tornadoes, we introduce a novel virtual annotation grouping approach to encode and maintain the provenance information. Furthermore, we design a summary index and adaptive pruning strategies to facilitate efficient message update. Based on this provenance index, our approach can support query and message tracking in micro-blog systems. Experiments conducted on real datasets verify the effectiveness and efficiency of our approach.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129004098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Tauheed, Laurynas Biveinis, T. Heinis, F. Schürmann, H. Markram, A. Ailamaki
Neuroscientists increasingly use computational tools in building and simulating models of the brain. The amounts of data involved in these simulations are immense and efficiently managing this data is key. One particular problem in analyzing this data is the scalable execution of range queries on spatial models of the brain. Known indexing approaches do not perform well even on today's small models which represent a small fraction of the brain, containing only few millions of densely packed spatial elements. The problem of current approaches is that with the increasing level of detail in the models, also the overlap in the tree structure increases, ultimately slowing down query execution. The neuroscientists' need to work with bigger and more detailed (denser) models thus motivates us to develop a new indexing approach. To this end we develop FLAT, a scalable indexing approach for dense data sets. We base the development of FLAT on the key observation that current approaches suffer from overlap in case of dense data sets. We hence design FLAT as an approach with two phases, each independent of density. In the first phase it uses a traditional spatial index to retrieve an initial object efficiently. In the second phase it traverses the initial object's neighborhood to retrieve the remaining query result. Our experimental results show that FLAT not only outperforms R-Tree variants from a factor of two up to eight but that it also achieves independence from data set size and density.
{"title":"Accelerating Range Queries for Brain Simulations","authors":"F. Tauheed, Laurynas Biveinis, T. Heinis, F. Schürmann, H. Markram, A. Ailamaki","doi":"10.1109/ICDE.2012.56","DOIUrl":"https://doi.org/10.1109/ICDE.2012.56","url":null,"abstract":"Neuroscientists increasingly use computational tools in building and simulating models of the brain. The amounts of data involved in these simulations are immense and efficiently managing this data is key. One particular problem in analyzing this data is the scalable execution of range queries on spatial models of the brain. Known indexing approaches do not perform well even on today's small models which represent a small fraction of the brain, containing only few millions of densely packed spatial elements. The problem of current approaches is that with the increasing level of detail in the models, also the overlap in the tree structure increases, ultimately slowing down query execution. The neuroscientists' need to work with bigger and more detailed (denser) models thus motivates us to develop a new indexing approach. To this end we develop FLAT, a scalable indexing approach for dense data sets. We base the development of FLAT on the key observation that current approaches suffer from overlap in case of dense data sets. We hence design FLAT as an approach with two phases, each independent of density. In the first phase it uses a traditional spatial index to retrieve an initial object efficiently. In the second phase it traverses the initial object's neighborhood to retrieve the remaining query result. Our experimental results show that FLAT not only outperforms R-Tree variants from a factor of two up to eight but that it also achieves independence from data set size and density.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121248881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A system for efficient keyword search in graphs is demonstrated. The system has two components, a search through only the nodes containing the input keywords for a set of nodes that are close to each other and together cover the input keywords and an exploration for finding how these nodes are related to each other. The system generates all or top-k answers in polynomial delay. Answers are presented to the user according to a ranking criterion so that the answers with nodes closer to each other are presented before the ones with nodes farther away from each other. In addition, the set of answers produced by our system is duplication free. The system uses two methods for presenting the final answer to the user. The presentation methods reveal relationships among the nodes in an answer through a tree or a multi-center graph. We will show that each method has its own advantages and disadvantages. The system is demonstrated using two challenging datasets, very large DBLP and highly cyclic Mondial. Challenges and difficulties in implementing an efficient keyword search system are also demonstrated.
{"title":"Efficient Top-k Keyword Search in Graphs with Polynomial Delay","authors":"M. Kargar, Aijun An","doi":"10.1109/ICDE.2012.124","DOIUrl":"https://doi.org/10.1109/ICDE.2012.124","url":null,"abstract":"A system for efficient keyword search in graphs is demonstrated. The system has two components, a search through only the nodes containing the input keywords for a set of nodes that are close to each other and together cover the input keywords and an exploration for finding how these nodes are related to each other. The system generates all or top-k answers in polynomial delay. Answers are presented to the user according to a ranking criterion so that the answers with nodes closer to each other are presented before the ones with nodes farther away from each other. In addition, the set of answers produced by our system is duplication free. The system uses two methods for presenting the final answer to the user. The presentation methods reveal relationships among the nodes in an answer through a tree or a multi-center graph. We will show that each method has its own advantages and disadvantages. The system is demonstrated using two challenging datasets, very large DBLP and highly cyclic Mondial. Challenges and difficulties in implementing an efficient keyword search system are also demonstrated.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117346790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. In most cases, the data is not purely available in text form. A lot of side-information is available along with the text documents. Such side-information may be of different kinds, such as the links in the document, user-access behavior from web logs, or other non-textual attributes which are embedded into the text document. Such attributes may contain a tremendous amount of information for clustering purposes. However, the relative importance of this side-information may be difficult to estimate, especially when some of the information is noisy. In such cases, it can be risky to incorporate side-information into the clustering process, because it can either improve the quality of the representation for clustering, or can add noise to the process. Therefore, we need a principled way to perform the clustering process, so as to maximize the advantages from using this side information. In this paper, we design an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach. We present experimental results on a number of real data sets in order to illustrate the advantages of using such an approach.
{"title":"On Text Clustering with Side Information","authors":"C. Aggarwal, Yuchen Zhao, Philip S. Yu","doi":"10.1109/ICDE.2012.111","DOIUrl":"https://doi.org/10.1109/ICDE.2012.111","url":null,"abstract":"Text clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. In most cases, the data is not purely available in text form. A lot of side-information is available along with the text documents. Such side-information may be of different kinds, such as the links in the document, user-access behavior from web logs, or other non-textual attributes which are embedded into the text document. Such attributes may contain a tremendous amount of information for clustering purposes. However, the relative importance of this side-information may be difficult to estimate, especially when some of the information is noisy. In such cases, it can be risky to incorporate side-information into the clustering process, because it can either improve the quality of the representation for clustering, or can add noise to the process. Therefore, we need a principled way to perform the clustering process, so as to maximize the advantages from using this side information. In this paper, we design an algorithm which combines classical partitioning algorithms with probabilistic models in order to create an effective clustering approach. We present experimental results on a number of real data sets in order to illustrate the advantages of using such an approach.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"15 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125761350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several efforts have been made for more privacy aware Online Social Networks (OSNs) to protect personal data against various privacy threats. However, despite the relevance of these proposals, we believe there is still the lack of a conceptual model on top of which privacy tools have to be designed. Central to this model should be the concept of risk. Therefore, in this paper, we propose a risk measure for OSNs. The aim is to associate a risk level with social network users in order to provide other users with a measure of how much it might be risky, in terms of disclosure of private information, to have interactions with them. We compute risk levels based on similarity and benefit measures, by also taking into account the user risk attitudes. In particular, we adopt an active learning approach for risk estimation, where user risk attitude is learned from few required user interactions. The risk estimation process discussed in this paper has been developed into a Facebook application and tested on real data. The experiments show the effectiveness of our proposal.
{"title":"Privacy in Social Networks: How Risky is Your Social Graph?","authors":"C. Akcora, B. Carminati, E. Ferrari","doi":"10.1109/ICDE.2012.99","DOIUrl":"https://doi.org/10.1109/ICDE.2012.99","url":null,"abstract":"Several efforts have been made for more privacy aware Online Social Networks (OSNs) to protect personal data against various privacy threats. However, despite the relevance of these proposals, we believe there is still the lack of a conceptual model on top of which privacy tools have to be designed. Central to this model should be the concept of risk. Therefore, in this paper, we propose a risk measure for OSNs. The aim is to associate a risk level with social network users in order to provide other users with a measure of how much it might be risky, in terms of disclosure of private information, to have interactions with them. We compute risk levels based on similarity and benefit measures, by also taking into account the user risk attitudes. In particular, we adopt an active learning approach for risk estimation, where user risk attitude is learned from few required user interactions. The risk estimation process discussed in this paper has been developed into a Facebook application and tested on real data. The experiments show the effectiveness of our proposal.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125260322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenneth P. Smith, A. Kini, William Wang, Chris Wolf, M. Allen, Andrew Sillers
The encrypted execution of database queries promises powerful security protections, however users are currently unlikely to benefit without significant expertise. In this demonstration, we illustrate a simple workflow enabling users to design secure executions of their queries. The Data Storm system demonstrated simplifies both the design and execution of encrypted execution plans, and represents progress toward the challenge of developing a general planner for encrypted query execution.
{"title":"Intuitive Interaction with Encrypted Query Execution in DataStorm","authors":"Kenneth P. Smith, A. Kini, William Wang, Chris Wolf, M. Allen, Andrew Sillers","doi":"10.1109/ICDE.2012.140","DOIUrl":"https://doi.org/10.1109/ICDE.2012.140","url":null,"abstract":"The encrypted execution of database queries promises powerful security protections, however users are currently unlikely to benefit without significant expertise. In this demonstration, we illustrate a simple workflow enabling users to design secure executions of their queries. The Data Storm system demonstrated simplifies both the design and execution of encrypted execution plans, and represents progress toward the challenge of developing a general planner for encrypted query execution.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131492546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mapping complex metadata structures is crucial in a number of domains such as data integration, ontology alignment or model management. To speed up the generation of such mappings, automatic matching systems were developed to compute mapping suggestions that can be corrected by a user. However, constructing and tuning match strategies still requires a high manual effort by matching experts as well as correct mappings to evaluate generated mappings. We therefore propose a self-configuring schema matching system that is able to automatically adapt to the given mapping problem at hand. Our approach is based on analyzing the input schemas as well as intermediate matching results. A variety of matching rules use the analysis results to automatically construct and adapt an underlying matching process for a given match task. We comprehensively evaluate our approach on different mapping problems from the schema, ontology and model management domains. The evaluation shows that our system is able to robustly return good quality mappings across different mapping problems and domains.
{"title":"A Self-Configuring Schema Matching System","authors":"E. Peukert, Julian Eberius, E. Rahm","doi":"10.1109/ICDE.2012.21","DOIUrl":"https://doi.org/10.1109/ICDE.2012.21","url":null,"abstract":"Mapping complex metadata structures is crucial in a number of domains such as data integration, ontology alignment or model management. To speed up the generation of such mappings, automatic matching systems were developed to compute mapping suggestions that can be corrected by a user. However, constructing and tuning match strategies still requires a high manual effort by matching experts as well as correct mappings to evaluate generated mappings. We therefore propose a self-configuring schema matching system that is able to automatically adapt to the given mapping problem at hand. Our approach is based on analyzing the input schemas as well as intermediate matching results. A variety of matching rules use the analysis results to automatically construct and adapt an underlying matching process for a given match task. We comprehensively evaluate our approach on different mapping problems from the schema, ontology and model management domains. The evaluation shows that our system is able to robustly return good quality mappings across different mapping problems and domains.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"86 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132756283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}